About Text Markup in RPM

Introduction

Recently a customer asked us:

“I found the ASA to TEXT MARKUP transform but what do I do with the output? It looks like HTML”

Actually, the output is not HTML. It’s an internal format we developed which we call “rf5”. If I had known about JSON (see json.org, https://en.wikipedia.org/wiki/JSON) I would have used that instead.

“rf5” stands for “RPM format version 5” since I came up with it for the transition from RPM 4.5 to RPM 5.0. The basic idea is that we had a lot of input formats coming into RPM and a lot of different types of output.

For example, we support a variety of plain text formats, plus ASA and SCS (which are both found on IBM machines it seems). We could convert those to PDF, or a cleaned up plain text, or print on a Windows printer, or convert to PCL or HTML.

If we had a conversion program for all of those types, here’s how many conversion programs you would need:

(number of input types) x ( number of output types)

That would be 15 for just the list mentioned above. If I added one more input type, I’d need five more conversion programs.

By converting those inputs to an intermediate format, I cut way back on the amount of conversion code we’d have to include in RPM; plus the user interface would be far simpler.

Inputs to text markup

Our customers give RPM a wide variety of text formats. If all our plain text input was 80 column lines with 60 column pages then we’d have nothing to worry about and you wouldn’t be reading this page.

As it is, the kinds of inputs we handle include:

  • Text with a variety of line endings
  • Text with optional segments, used for bold or overstrike
  • Text with nulls or other special characters
  • We run into a lot of text from Linux systems that have a few extra carriage returns in with the line feeds; rather than interpret those as multiple lines, the text markup scanner will recognize that’s one line.

Another Linux artifact we run across is using a backspace character to accomplish overstrikes. RPM handles that as well as line segments marked by a carriage return, ending in a single line feed.

We also handle stylized text such as ASA, where the first letter of each line might be a command to:

  • Overstrike all or part of this line with another input line
  • Write this line out as is
  • Insert one or more blank lines

FCFC is a recent addition and is basically similar to ASA with more command options.

SCS is an EBCDIC format from IBM systems which includes dozens of formatting codes. RPM converts SCS input to text markup. In fact, the full feature list for text markup is based on the capabilities of SCS input, for instance, the ability to set font name and size, turn bold on and off, adjust the font cell, and quite a few more.

Text markup outputs

Most often we convert incoming text to something graphic in nature, for instance, a PDF or print-ready PCL, or printed output using the Windows interface to your printers.

It’s our goal to take a text markup file from your input and make it look the same in a PDF document as it does on your printer, assuming fonts are available and the margins and page dimensions are basically compatible.

More text markup conversions

RPM offers a COR transform. We worked with a number of customers who used a similarly named function on their IBM systems. The purpose of this function is to take text markup input, something already converted from another transform, and smooth the lines per page, the line length, and the page orientation.

The Text to Text Markup transform also supports matching line lengths automatically to a target output line length and changing the page orientation if necessary.

A number of customers use the “Remove Text Markup” transform to make a plain text version of their original input.

Text printing depends on text markup

Notably, the Text Print action relies on text markup in order to generate a proper print of your data. You should if at all possible configure your workflow to send text markup data to the text print action.

However, when the text print action sees plaintext data, it creates a text markup transform inline which parses the data using defaults. These defaults are the same ones you see when you first add a Text to Text Markup transform. These include:

  • Half inch margins
  • Calculate a font size and auto-rotate the page orientation, using a max line length of 90 characters
  • Use the longest line of the input for these calculations
  • Use Courier New font, 12 point

Conclusion

Hopefully, once you’ve read this article you can better see how to leverage the text markup conversions for your workflow if, in fact, that’s appropriate for your situation.

We invite you to talk with our technical staff about your questions and ideas.