Standard filters are provided with every UnForm installation, available for use in Image Manager job definitions. The tables below describe the use of each of these standard filters. Using the script editing tool, it is possible to copy these filters to the filters.custom.ini file, and use the filter's code as the basis for a new custom filter.
All filters are designed to convert an input string to an output string. The output string is used in place of the original string when the filter is applied. The code of the filter must use two variables: in$ and out$, that contain the input and output data. The in$ variable is pre-populated when the filter code starts to run, and it is the code's job to create the out$ variable with the desired result. The out$ filter always starts as null ("") when the filter code is run.
Parameters
Many filters use parameters, which are case-sensitive named values, passed inside parentheses as name=value pairs. Parameter values can be literal values or expressions enclosed in curly braces. Often an expression will refer to a job parameter, like {param.taxheader$}, or another zone or data field. If a parameter value contains a comma or parenthesis, it should be quoted.
Chained Filters
In job design, filters can be chained together to achieve an end result, such as removing headers, removing footers, and joining rows. When chained, the output from one filter is used as the input for the next filter in the zone or field definition.
Page Headers
Many filters are applied to zone data, either ocr or barcode, which is text lifted off one or more pages in a specific position. Generally this data will have multiple lines, and page headers, so often filters are designed to get rid on this extra information. Page headers look like a line with this structure: [page]. Many filters remove these page header rows as part of their logic, using one of these functions:
•iszonepagehdr(x$) returns true (1) if a text value, normally a single line, is a page header (the whole value is "[n]")
•removezonepagehdrs(x$) removes all page headers from the string passed to it. There is also a filter named RemovePageHeaderRows that executes this function in a filter context.
Text Filters
These filters are designed to work with single- or multi-line text, where the text is not structured into columns. Examples are document identifiers, name and address blocks, subtotals and totals, or dates. They differ from grid data, which is composed of rows of discrete column data. These filters can also work on grid column zones, but to retain column relations across rows, avoid filters that remove headers or lines in grid column zones.
Name
|
Removes Headers
|
Description
|
anyText
|
Yes
|
Removes page headers and converts line breaks to spaces, so all words are on one line.
|
concat
|
No
|
Concatenates two values, with an optional delimiter. The values and delimiter are passed as parameters.
Parameters:
val1=value
val2=value
dlm=delimiter
Examples:
concat(val1=Customer,val2={field.CustID$},dlm=": ")
|
dateColumn
|
No
|
Formats dates in a multi-line string, returning a string with the same number of lines, with each line either a formatted date or null. This is useful in a grid column with date values.
Parameters:
DateOrder=mdy|dmy|ymd - what order are the date segments, for date parsing purposes.
Format=format - the format the dates should be in the output, using YYYY, YY, MM, or DD placeholders.
Alpha=y|n - if set to y or yes, month names are first converted to month numbers before the parse.
Examples:
dateColumn(DateOrder=mdy,Format=YYYY-MM-DD,Alpha=n)
|
extractPattern
|
Yes
|
Returns data from a string that matches a regular expression pattern, or a simplified pattern. If the regex contains a parenthesized group, the group is returned. UnForm supports Perl compatible regular expressions (PCRE)
Parameters:
regex=regular-expression (may contain one parenthesized group)
pattern=pattern
The pattern is designed to be simpler to use, but offers less flexibility than a regular expression. Patterns honor the following special matching characters:
•A matches any uppercase letter
•a matches any lowercase letter
•X matches any letter
•# matches a digit
•* matches any number of characters
•? matches any single character
•[list] matches the list of characters (i.e. [+-] matches either plus or minus
•\ escapes the next character so it is interpreted as a literal (\A matches only "A")
•Other characters match themselves
Examples:
extractPattern(regex=[A-Z]{1,3}-\d\d\d\d) returns three upper-case letters, a hyphen, and 4 digits, if they are present in the input.
extractPattern(regex="Customer: (\d+)") returns a customer number, if it is one or more digits following a text string "Customer: ".
extractPattern(pattern=AA-####) matches two uppercase letters, a dash, and four digits.
extractPattern(pattern=\AX-*-#) matches an A followed by any letter, a dash, any number of characters, another dash, and a digit.
|
firstLine
|
Yes
|
Returns the first line in a zone that is not empty or a page header.
|
firstWord
|
Yes
|
Returns the first word from the first line in a zone that is not empty or page header line.
|
forceUndetectedDecimal
|
No
|
Scans lines looking for values that do not have a decimal. Numeric values that do not have a decimal point are assumed to be a misread and the value is adjusted by scaling it for decimals. For example, a value of "199 00" would become "199.00".
Parameters:
decimals=n - the number of decimal places expected in column values.
|
lastLine
|
Yes
|
Returns the last line that is not empty or a page header
|
lastWord
|
Yes
|
Returns the last word from the last line that is not empty or a page header
|
MapValue
|
No
|
Modifies values in a multi-line string based on a map of original and new values. For example, you can normalize unit of measure values by mapping vendor values to your own values.
Parameters:
FromList=list - a delimited list of values to convert from
ToList=list - a delimited list of corresponding values to convert to
Delim=delimiter - the list value separator, which defaults to a comma
Examples:
MapValue(FromList="Each,Case,Dozen",ToList="EA,CS,DZ")
MapValue(FromList={param.VendorUOM$},ToList={param.OurUOM$})
|
numColDecimals
|
No
|
Reformats lines of numbers to ensure they have a certain decimal precision. Non-number lines become empty lines. Optionally provide a format for the numbers, to enable more control.
Parameters:
decimals=number - the number of decimal places numbers must have
format=mask - a formatting mask, using #,0,-,. characters
Examples:
numColDecimals(decimals=3) - a row with "1,123.24" would become "1123.240"
numColDecimals(decimals=,format="-###,###.000") - note the quotes to protect the format comma
The mask can contain any text, plus the following placeholder characters: 0=zero filled digit, #=space filled digit, "."=decimal point, ","=thousands separator, -, (, ), and CR for negative numbers.
|
OCRLettersToNumbers
|
No
|
Replaces common character misreads in what should be numeric data.
|
parseDelimitedDate
|
Yes
|
Converts a delimited date value to a new format.
Parameters:
DateOrder=mdy|dmy|ymd - what order are the date segments, for date parsing purposes.
Format=format - the format the dates should be in the output, using YYYY, YY, MM, or DD placeholders.
Alpha=y|n - if set to y or yes, month names are first converted to month numbers before the parse.
Examples:
parseDelimitedDate(DateOrder=mdy,Format=YYYY-MM-DD,Alpha=n)
|
RemoveChars
|
Yes
|
Removes a set of characters from each line.
Parameters:
CharList=chars or $hexstr$ - a sequence of characters or a hexadecimal string, each character of which is removed from each line
Examples:
RemoveChars(CharList="-,.#") - removes hyphens, commas, periods, and hash signs, using a quoted string to protect the comma from being parsed.
RemoveChars(CharList=$0922$) - removes tab characters and quote characters.
|
RemovePageHdrRows
|
Yes
|
Removes page header rows ([1], [2], etc.) from the zone.
|
RemoveSpaces
|
Yes
|
Removes spaces from each line in the zone. This is useful in cases where a value should not have spaces, but font calculations by the OCR engine have inserted spaces.
|
trimLines
|
Yes
|
Removes page headers and empty lines.
|
UnwrapHyphenWords
|
No
|
Looks for lines that end with a hyphen (-) and are followed by a line with data. Where found, the second line is appended to the first line, and the second line becomes an empty line. Note the hyphen is not removed, as often it is part of valid data. To remove it, use RemoveChars.
|
validChars
|
Yes
|
This filter looks for a specific line, then filters the value in that line so it only contains characters for a type of value. A similar but more flexible filter is ExtractPattern.
Parameters:
type=class - defines the set of valid characters
firstlast=first|last|lastallpages - first line, last line of first page, last line of all pages
The class type can be one of these:
•letters - upper and lowercase letters
•alpha - letters and digits
•alphanum=letters, digits, spaces, punctuation
•alphanum2=uppercase letters, digits, punctuation
•numbers=digits, commas, decimal points, minus sign
•numbersparen=numbers and parentheses
•digits=only digits
•datechars=digits, forward slash, hyphen
•datecharsalpha=digits, letters, comma, forward slash, space, and hypen
After character filtering, if type=datacharsalpha, month names are converted to digit equivalents.
|
VerticalTotalZoneMatch
|
Yes
|
Scans lines for a label or pattern match, and returns a value found to the right of that label or pattern. This can be used to extract a specific value, such as a total, freight, or tax amount, from a zone. It can look for a single value, or it can sum multiple values.
Parameters:
label=string or string1|string2... a label or ~regex pattern, or a pipe-separated list of them
textmode=y to return a textual value instead of a numeric one
The label value can contain multiple items separated with a pipe (|). Each of these can be either a text string or a ~regex pattern. A text string can contain spaces to look for more than one word preceding the target value.
The textmode=y is only supported for single label values.
Examples:
VerticalTotalZoneMatch(label=~\w+ Tax:) returns a numeric value to the right of a line that contains word characters, a space, and the word "Tax:"
VerticalTotalZoneMatch(label=CA Tax|LA Tax) returns a sum of values to the right of "CA Tax" and "LA Tax".
VerticalTotalZoneMatch(label=Carrier:,textmode=y) returns any text to the right of Carrier:.
|
wordAfter
|
Yes
|
Returns a word after a specified word or words. The input text is first flattened to not have any line breaks, and to only have single spacing. This is useful to extract a value that follows a header, either to the left or above a target value.
Parameters:
Searchfor=word(s) - a case-insensitive word or words that precedes the target value
Examples:
wordAfter(Searchfor=Purchase Order) - returns a word after or below "Purchase Order".
|
Grid Filters
These filters are designed to manipulate grid data, which is internally stored as a tab-separated-values list with a first row of headers based on the column zone names that the grid contains. Grid filters are designed to retain column relations across rows. They often use the grid object, which provides a large number of methods to manipulate such data.
Name
|
Description
|
colTotalNamedCol
|
Returns the sum of numbers in a given column, defined by name or column number.
Parameters:
colName=name or number, specifies which grid column to use.
Examples:
colTotalNamedCol(colName=Extension)
colTotalNamedCol(colName={param.ExtCol$})
|
FieldFromGridRow
|
Scans a grid column, bottom to top, for a value or a regular expression pattern. If found, get a value from the same row in another column, and place that value in a data field defined in the job. That row is removed from the grid. Note how this differs from a typical filter in that a data field value is updated, rather than simply being returned in out$.
Parameters:
SearchFor=value or ~regex - what to look for in SearchCol
SearchCol=name or number - what column to scan for SearchFor
FieldName=name - the data field name in the job definition where a value should be placed
FieldCol=name or number - the column that contains the value to place in the data field
Examples:
FieldFromGridRow(SearchFor=~Freight:,SearchCol=ItemDesc,FieldName=Shipping,FieldCol=Price)
|
FindGridRow
|
Scans a grid column for a value or pattern, and returns that row's value from another column. This is useful for extracting data values from a grid, such as freight or tax value, in a data field filter. It is similar to FieldFromGridRow, but is typically used in a data field rather than a grid zone.
Parameters:
SearchFor=value or ~regex - what to look for in SearchCol
SearchCol=name or number - what column to scan for SearchFor
ReturnCol=name or number - the column containing the row data to return
Examples:
FindGridRow(SearchFor=CA Tax,SearchCol=Item,ReturnCol=Extension)
|
initCol
|
Initializes a grid column to all empty rows. This can be used to clear a column after another filter has used data in it and it is no longer needed.
Parameters:
Col=name - the name of the column to initialize
|
JoinRows
|
Joins rows together based on a value or pattern in a specific column or column list, by appending row values together in all columns. The logic groups rows together based on a value or pattern in a specific column. An empty row also starts a new group. This works for well structured line detail that has a top row for each line, and stacked data in some columns. When appending, a space separates the line values.
Parameters:
SearchCol=name or name1,name2... - columns to search
SearchFor=value or ~regex - a value or pattern to scan SearchCol for
RowDelim=text to place between row values, defaults to a space
Examples:
JoinRows(SearchCol=LineNo,SearchFor=~\d,RowDelim=|) - scans the LineNo column for any digit, when found treat that row as the top line, and subsequent lines are appended to this one in each column, until the next LineNo match or an empty line. A pipe (|) is inserted between each joined row.
|
KeepPatternRows
|
Filters the rows in a grid, retaining only those that match a value or pattern in a specified column. This is useful to filter out rows with just item details, often used after first joining rows.
Parameters:
SearchCol=name - the column to scan
SearchFor=value or ~regex - the value or pattern to match in SearchCol - non matching rows are removed
Examples:
KeepPatternRows(SearchCol=Item,SearchFor=~[A-Z0-9]{4,16} - keep rows with Item column values from 4 to 16 uppercase letters or digits
|
MapGridCol
|
Modifies values in a grid column based on a map of original and new values. For example, you can normalize unit of measure values by mapping vendor values to your own values.
Parameters:
ColName=name - name of column to scan
FromList=list - a delimited list of values to convert from
ToList=list - a delimited list of corresponding values to convert to
Delim=delimiter - the list value separator, which defaults to a comma
Examples:
MapGridCol(ColName=UOM,FromList="Each,Case,Dozen",ToList="EA,CS,DZ")
|
RemoveEmptyRows
|
This removes rows that have no content in any column.
|
RemoveGridFooter
|
Scans for the last occurrence of a value or regular expression one or more columns, and removes that row and those below it.
Parameters:
SearchCols=col1|col2|... - an optional pipe-delimited list of column names or numbers, if empty all columns are searched.
SearchValue=value or ~regex - a value or regular expression to search for
Examples:
RemoveGridFooter(SearchCols=Item|Description,SearchValue=Subtotal)
|
RemoveGridHeader
|
Scans for the first occurrence of a value or regular expression one or more columns, and removes that row and those above it. This filter can work at the document level, or it can scan page by page.
Parameters:
SearchCols=col1|col2|... - an optional pipe-delimited list of column names or numbers, if empty all columns are searched.
SearchValue=value or ~regex - a value or regular expression to search for
ByPage=y - set to y or yes to scan each page for headers to remove
Examples:
RemoveGridHeader(SearchCols=Item,SearchValue=Part No,ByPage=y)
|
RemoveGridRows
|
This filter is deprecated, and only included for supporting existing jobs developed during the v10 beta cycle and still in use.
|
RemovePageHdrRows
|
Removes page header rows ([1], [2], etc.) from the grid.
|
RemovePatternRows
|
Removes rows where a column matches a value or regular expression pattern. This is the opposite of the KeepPatternRows filter.
Parameters:
SearchCol=name or number - the column to search
SearchFor=value or ~regex - the value or pattern to search for in the specified column
Examples:
RemovePatternRows(SearchCol=LineNo,SearchFor=~[A-Z][A-Z]) - remove lines with uppercase letters in the LineNo column.
RemovePatternRows(SearchCol=Item,SearchFor=Tax) - removes rows with Item values of "Tax".
|
SplitColumn
|
Splits values in a column into two separate columns. One column contains the original data, and retains the first portion of the split, and a second column receives the second portion of the split. This can be used in cases where one column contains two distinct pieces of information, such as a quantity and unit of measure.
Parameters:
FirstCol=colname - the column that contains the data to split, and retains the first portion of the split data
SecondCol=colname - the column to place the second portion of the split data
Splitter=number or character - defines how the split is performed
Splitter Options:
•Positive number, the first portion is this many characters from the start of the data, the second portion is the remaining characters
•Negative number, the second portion is this many characters from the end of the data, the first portion is remaining characters
•~^regex - a regular expression anchored to the front of the string - the first portion is the matched characters, the second portion is the remaining characters
•~regex$ - a regular expression anchored to the end of the string - the second portion is the matched characters, the first portion is the remaining characters
•~regex - an unanchored regular expression - the second portion is the matched characters, the first portion is the original value with the matched portion removed. If there is a parenthesized group, that group's value is used as the second portion and the entire match is removed to form the first portion
•delimiter - a string delimiter splits the value on that delimiter, and the first portion is the data before the delimiter, and the second portion is the data after the delimiter
Examples:
SplitColumn(FirstCol=Qty,SecondCol=UOM,Splitter=-2) would split "123EA" into 123 and EA, by using the last 2 characters as the second portion.
SplitColumn(FirstCol=Qty,SecondCol=UOM,Splitter=~^\d+) would split "123EA" into 123 and EA, by matching one or more digits at the start of the string to be the first portion.
SplitColumn(FirstCol=Qty,SecondCol=UOM,Splitter=~[A-Z]+$) would split "123EA" into 123 and EA, by matching one or more uppercase letters at the end of the string to be the second portion.
SplitColumn(FirstCol=Qty,SecondCol=UOM,Splitter=/) would split "123/EA" into 123 and EA, using "/" as a delimiter.
|
StackedLineCleanup
|
This filter is deprecated, and only included for supporting existing jobs developed during the v10 beta cycle and still in use.
|
|