Top  Previous  Next

UnForm recognizes PostScript and PDF input streams, and will attempt to process them as pre-formatted print jobs.  To do this, GhostScript 9.06 or higher  must be configured.  UnForm uses Ghostscript to both extract text and convert the job to PDF pages. An optional tool, MuPDF, can be installed and configured, in which case text extraction will use that program.


Caveat: Note that not all such print jobs contain text.  Some contain images of text, and some contain a mixture of text and images of text.  Only true text data can be used by UnForm as rule set data.  In addition, sometimes text elements contain large regions of clear space around the text itself, posing some challenges for parsing text by location.  The availability and usefulness of text is determined by the printing application and GhostScript, not UnForm.


The Design Tool can submit PostScript or PDF sample data to the UnForm server, and highlight each text element found on each page (by checking the Add Text Base option on the Preview menu).


The technique used by UnForm when it receives a PostScript or PDF print stream is to generate an overlay of each page in the output format of the job.  UnForm graphical commands can then be used to add elements to this overlay, to scale it, and to erase regions from it.  Other than selective erasure, it is not possible to modify the overlay.  In many cases, there will be very limited, if any, cosmetic enhancements needed, allowing the implementer to focus exclusively on document management features such as electronic delivery and archiving.


The most common method of integrating AFO printing with UnForm is to use a Windows printer configured with a PostScript print driver and a TCP/IP port directed an UnForm TCP/IP monitor.


The –noafo command line option can be used to suppress AFO processing for Postscript/PDF input, and may also be useful as an argument passed to a subjob in a jobexec() function, as subjobs of AFO jobs are by default treated as AFO jobs themselves.


The -afo2 command line option can be used to utilize a different text parsing algorithm, which attempts to parse words from print  stream data, which often uses phrases rather than simple words. This can impact detection logic and also the gtext*() functions, so a rule set designed for one mode may not work in the other mode.  The new mode is used by the Image Manager when calculating word positions for zonal OCR text extraction.


Normally, when Ghostscript-created words are parsed, UnForm will attempt to sort them in a top-down, left-to-right order, but this algorithm can produce odd results under some circumstances with mixed character heights.  An alternative is to set the uf100d.ini [defaults] afogsorder flag to 1, which suppresses this second sort, and accepts the order in which Ghostscript reported the words.  Ghostscript versions after 9.12 produce more reliable results than older versions.


As the PDF pages are treated similar to overlays, the orientation of the UnForm job must match that of the PostScript or PDF input.  For example, if the input uses landscape orientation, the UnForm rule set should include a landscape command.


Text vs. PostScript/PDF Print Stream Management

When working with plain text input, UnForm has commands that manipulate or apply enhancements to a text print stream, such as font, bold, and erase.  Also, code blocks can manipulate the text$[] array, resulting in modified print stream text.  However, when working with PostScript print streams, there is no text array, and commands that depend on it are not available.  One exception is erase, which is translated to be a shade command with a shade value of 0, resulting in erasure of the specified region of the overlay.  Also, the notext command and its new synonym, nooverlay, may be used to suppress printing of the overlay on any copy or all pages.


The following commands are not compatible with AFO:

any Zebra- or label-only commands


In addition, many commands support anchor text or patterns, which cause a search of the text content of the page to locate positions to apply enhancements.  Supported commands that offer this feature, such as barcode, box, and text, continue to support the anchor search technique.  However, since the location of PostScript text regions do not always correspond to the visual location or size of the text, accuracy can vary.


The margin command is honored (starting in version 10), to adjust the positioning and scaling of the overlay in cases where passed-through print data is outside of the printable margin of paper.


If AFO text regions vary from visual location or size, then detection logic may require greater flexibility than with simple text input streams.  The detect command has been enhanced to support partial columns and rows, but it may be necessary in some cases to detect elements from the whole page rather than regions.


Text Array Limitations in Code Blocks and Expressions

Many code block functions that work with a text print stream are also not available.  However, the get() and mget() functions have been enhanced to return text data from the PostScript print stream, plus three new functions have been added, gtextcount(), gtextitem(), and gtextfind(), which provide access to the text elements parsed from the PostScript print data.  A new variable, nooverlay, can be set to 1 in prepage or precopy code blocks to suppress the printing of the overlay.  This can be used to manage multi-format jobs, such as those with terms and condition attachments.


The following code block functions are not compatible with AFO:



The arrays text$[], textjob$[], and textpage$[] are not available.


New Functions For Accessing Text


gproperty() returns values from PostScript DSC comments in the print stream.
gtextcount() returns the number of text elements in a page.
gtextitem() returns text and optional region information for a given text element on a page.
gtextfind() searches for patterns in text and returns arrays of text and region information found.