About Text Markup in RPM

Submitted by Dave Brooks on Mon, 10/29/2018 - 20:13

What is text markup?

Text markup is a basic markup language I created for RPM Remote Print Manager® ("RPM") to describe the elements of a print job. I've included the story behind text markup to help you better understand why that is important and how to use it.

When I first created RPM, I included a function that would let you very quickly send text jobs to a Windows printer. I also assumed that your text was not formatted, so I did the formatting for you.

You set the page margins.
You set the lines per page, typically 60 lines, and columns per line, typically 80 columns. But it could be anything.
You select a Windows printer.
You select any font available for that printer.

Then, whenever RPM printed a job, it analyzed the size of the paper loaded in your paper tray and the font you chose. Next, RPM automatically scaled the font size to fit correctly. Finally, it did all the calculations to make your print job appear just as you specified. And it all took very few buttons to click.

This feature was very popular. I first released RPM before student registration at a major university. Dozens of campus staff used the first release of RPM without special training, and we minimized student wait time as we printed their class schedules on demand and at their current location.

This particular print function in RPM had one input:

Plain text of any format

And we had one output for this text:

a Windows printer

RPM performed this printing process I described in one giant step. RPM read the file, analyzed the data, completed all the calculations, and printed everything.

When it came time later to expand RPM to do new things with print data, I quickly realized that this approach needed to be more scalable. So, I came up with a markup language describing print data elements. That is text markup.

Once RPM reads print data and generates text markup, we can move to the next step, creating one of several outputs.

Outputs for text markup

This conversation about outputs is strictly related to marked-up text, that is, text with features you have some control over. That is a different conversation if you are talking about binary or print-ready formats. RPM has features for that.

To our original output, a Windows printer, we have added the following:

convert to PDF
convert to PCL, a common HP printer language
convert to HTML

I note that these are not strictly outputs; they are closer to "goals" in that I hope to represent your printed page accurately and the same in each instance. So, for example, PDF should be as close as possible to the printed page.

The one actual output continues to be a Windows printer. All the data formats above are files. RPM can:

save a file to disk
attach it to an email
upload it to an FTP server
send it via TCP/IP to a networked printer
We can also run a program on your file. For instance, you could use the "curl" program to send files using protocols that RPM does not support internally.

Inputs we translate to text markup

To our original input, plain text, we've added the following:

SCS, a binary page format created by IBM. It's relatively complex, and our text markup seeks to incorporate all its features.
ASA, which used to be well known as "Fortran printing." We used the ASA command codes for printed output in the early Fortran days.
FCFC is a specialty format similar to ASA but less widely used.

We've talked a lot about plain text, but RPM uses a broad definition for "plain." Here is a collection of rules RPM uses for analyzing text.

We can use carriage return and line feed independently. Line feed always means “end of this line, start a new line.”
Carriage return means "back up to the beginning of the line." If the carriage return is followed by a new line, as it often is, we simply follow the typical convention of carriage return/line feed. Otherwise, carriage return means that we will repeat portions of the text.
Another way to repeat text is with a backspace character. RPM understands the two most widely used, Ctrl+B and Ctrl+H.
When text repeats, there are one of two possibilities:
- The text is the same in each instance, in which case the user wants to bold
- The text is different, in which case the user wants to overstrike

Smoothing text markup

We used to have a guy in the company who worked with AS/400 systems. One day he asked if RPM used COR. He explained this is an optional feature of AS/400, which will use certain rules to improve your printed output. COR improves your printed output by making your pages look more “full” without stretching or shrinking the font.

COR stands for “computer output reduction.” I never understood that title. We found the rules for COR online and implemented them.

The way you use COR, if you want to, is to add a COR transform immediately after whatever transform you used to produce text markup.

AutoCalc

Another consideration for text markup is to use AutoCalc, which calculates the longest line on a page and the longest pages (between form feeds) and sets the columns per line, and lines per page, to those values. This can easily produce awful-looking results if you have inconsistent line or page lengths. However, if your lines and pages are pretty uniform but not perfect, this may be a good option for you.

AutoCalc and Text print

A further note on AutoCalc is this. If you do a text print, and the text print action sees that you are passing in plain text and not text markup, then the text print will create a text markup transform, set AutoCalc to be true, and pass all your print data through that.

This may or may not be a good solution for you, depending on your print data. Our advice is to make sure you use a text markup transform of one kind or another when you print text to a Windows printer in RPM.