Standard Filters

Standard filters are provided with every UnForm installation, available for use in Image Manager job definitions. The tables below describe the use of each of these standard filters. Using the script editing tool, it is possible to copy these filters to the filters.custom.ini file, and use the filter's code as the basis for a new custom filter.

All filters are designed to convert an input string to an output string. The output string is used in place of the original string when the filter is applied. The code of the filter must use two variables: in$ and out$, that contain the input and output data. The in$ variable is pre-populated when the filter code starts to run, and it is the code's job to create the out$ variable with the desired result. The out$ filter always starts as null ("") when the filter code is run.

Many filters use parameters, which are case-sensitive named values, passed inside parentheses as name=value pairs. Parameter values can be literal values or expressions enclosed in curly braces. Often an expression will refer to a job parameter, like {param.taxheader$}, or another zone or data field. If a parameter value contains a comma or parenthesis, it should be quoted.

In job design, filters can be chained together to achieve an end result, such as removing headers, removing footers, and joining rows. When chained, the output from one filter is used as the input for the next filter in the zone or field definition.

Many filters are applied to zone data, either ocr or barcode, which is text lifted off one or more pages in a specific position. Generally this data will have multiple lines, and page headers, so often filters are designed to get rid on this extra information. Page headers look like a line with this structure: [page]. Many filters remove these page header rows as part of their logic, using one of these functions:

•iszonepagehdr(x$) returns true (1) if a text value, normally a single line, is a page header (the whole value is "[n]")

•removezonepagehdrs(x$) removes all page headers from the string passed to it. There is also a filter named RemovePageHeaderRows that executes this function in a filter context.

These filters are designed to work with single- or multi-line text, where the text is not structured into columns. Examples are document identifiers, name and address blocks, subtotals and totals, or dates. They differ from grid data, which is composed of rows of discrete column data. These filters can also work on grid column zones, but to retain column relations across rows, avoid filters that remove headers or lines in grid column zones.

Name	Removes Headers	Description
anyText	Yes	Removes page headers and converts line breaks to spaces, so all words are on one line.
concat	No	Concatenates two values, with an optional delimiter. The values and delimiter are passed as parameters. Parameters: val1=value val2=value dlm=delimiter Examples: concat(val1=Customer,val2={field.CustID$},dlm=": ")
dateColumn	No	Formats dates in a multi-line string, returning a string with the same number of lines, with each line either a formatted date or null. This is useful in a grid column with date values. Parameters: DateOrder=mdy\|dmy\|ymd - what order are the date segments, for date parsing purposes. Format=format - the format the dates should be in the output, using YYYY, YY, MM, or DD placeholders. Alpha=y\|n - if set to y or yes, month names are first converted to month numbers before the parse. Examples: dateColumn(DateOrder=mdy,Format=YYYY-MM-DD,Alpha=n)
extractPattern	Yes	Returns data from a string that matches a regular expression pattern, or a simplified pattern. If the regex contains a parenthesized group, the group is returned. UnForm supports Perl compatible regular expressions (PCRE) Parameters: regex=regular-expression (may contain one parenthesized group) pattern=pattern The pattern is designed to be simpler to use, but offers less flexibility than a regular expression. Patterns honor the following special matching characters: •A matches any uppercase letter •a matches any lowercase letter •X matches any letter •# matches a digit •* matches any number of characters •? matches any single character •[list] matches the list of characters (i.e. [+-] matches either plus or minus •\ escapes the next character so it is interpreted as a literal (\A matches only "A") •Other characters match themselves Examples: extractPattern(regex=[A-Z]{1,3}-\d\d\d\d) returns three upper-case letters, a hyphen, and 4 digits, if they are present in the input. extractPattern(regex="Customer: (\d+)") returns a customer number, if it is one or more digits following a text string "Customer: ". extractPattern(pattern=AA-####) matches two uppercase letters, a dash, and four digits. extractPattern(pattern=\AX-*-#) matches an A followed by any letter, a dash, any number of characters, another dash, and a digit.
firstLine	Yes	Returns the first line in a zone that is not empty or a page header.
firstWord	Yes	Returns the first word from the first line in a zone that is not empty or page header line.
forceUndetectedDecimal	No	Scans lines looking for values that do not have a decimal. Numeric values that do not have a decimal point are assumed to be a misread and the value is adjusted by scaling it for decimals. For example, a value of "199 00" would become "199.00". Parameters: decimals=n - the number of decimal places expected in column values.
lastLine	Yes	Returns the last line that is not empty or a page header
lastWord	Yes	Returns the last word from the last line that is not empty or a page header
MapValue	No	Modifies values in a multi-line string based on a map of original and new values. For example, you can normalize unit of measure values by mapping vendor values to your own values. Parameters: FromList=list - a delimited list of values to convert from ToList=list - a delimited list of corresponding values to convert to Delim=delimiter - the list value separator, which defaults to a comma Examples: MapValue(FromList="Each,Case,Dozen",ToList="EA,CS,DZ") MapValue(FromList={param.VendorUOM$},ToList={param.OurUOM$})
numColDecimals	No	Reformats lines of numbers to ensure they have a certain decimal precision. Non-number lines become empty lines. Optionally provide a format for the numbers, to enable more control. Parameters: decimals=number - the number of decimal places numbers must have format=mask - a formatting mask, using #,0,-,. characters Examples: numColDecimals(decimals=3) - a row with "1,123.24" would become "1123.240" numColDecimals(decimals=,format="-###,###.000") - note the quotes to protect the format comma The mask can contain any text, plus the following placeholder characters: 0=zero filled digit, #=space filled digit, "."=decimal point, ","=thousands separator, -, (, ), and CR for negative numbers.
OCRLettersToNumbers	No	Replaces common character misreads in what should be numeric data.
parseDelimitedDate	Yes	Converts a delimited date value to a new format. Parameters: DateOrder=mdy\|dmy\|ymd - what order are the date segments, for date parsing purposes. Format=format - the format the dates should be in the output, using YYYY, YY, MM, or DD placeholders. Alpha=y\|n - if set to y or yes, month names are first converted to month numbers before the parse. Examples: parseDelimitedDate(DateOrder=mdy,Format=YYYY-MM-DD,Alpha=n)
RemoveChars	Yes	Removes a set of characters from each line. Parameters: CharList=chars or $hexstr$ - a sequence of characters or a hexadecimal string, each character of which is removed from each line Examples: RemoveChars(CharList="-,.#") - removes hyphens, commas, periods, and hash signs, using a quoted string to protect the comma from being parsed. RemoveChars(CharList=$0922$) - removes tab characters and quote characters.
RemovePageHdrRows	Yes	Removes page header rows ([1], [2], etc.) from the zone.
RemoveSpaces	Yes	Removes spaces from each line in the zone. This is useful in cases where a value should not have spaces, but font calculations by the OCR engine have inserted spaces.
trimLines	Yes	Removes page headers and empty lines.
UnwrapHyphenWords	No	Looks for lines that end with a hyphen (-) and are followed by a line with data. Where found, the second line is appended to the first line, and the second line becomes an empty line. Note the hyphen is not removed, as often it is part of valid data. To remove it, use RemoveChars.
validChars	Yes	This filter looks for a specific line, then filters the value in that line so it only contains characters for a type of value. A similar but more flexible filter is ExtractPattern. Parameters: type=class - defines the set of valid characters firstlast=first\|last\|lastallpages - first line, last line of first page, last line of all pages The class type can be one of these: •letters - upper and lowercase letters •alpha - letters and digits •alphanum=letters, digits, spaces, punctuation •alphanum2=uppercase letters, digits, punctuation •numbers=digits, commas, decimal points, minus sign •numbersparen=numbers and parentheses •digits=only digits •datechars=digits, forward slash, hyphen •datecharsalpha=digits, letters, comma, forward slash, space, and hypen After character filtering, if type=datacharsalpha, month names are converted to digit equivalents.
VerticalTotalZoneMatch	Yes	Scans lines for a label or pattern match, and returns a value found to the right of that label or pattern. This can be used to extract a specific value, such as a total, freight, or tax amount, from a zone. It can look for a single value, or it can sum multiple values. Parameters: label=string or string1\|string2... a label or ~regex pattern, or a pipe-separated list of them textmode=y to return a textual value instead of a numeric one The label value can contain multiple items separated with a pipe (\|). Each of these can be either a text string or a ~regex pattern. A text string can contain spaces to look for more than one word preceding the target value. The textmode=y is only supported for single label values. Examples: VerticalTotalZoneMatch(label=~\w+ Tax:) returns a numeric value to the right of a line that contains word characters, a space, and the word "Tax:" VerticalTotalZoneMatch(label=CA Tax\|LA Tax) returns a sum of values to the right of "CA Tax" and "LA Tax". VerticalTotalZoneMatch(label=Carrier:,textmode=y) returns any text to the right of Carrier:.
wordAfter	Yes	Returns a word after a specified word or words. The input text is first flattened to not have any line breaks, and to only have single spacing. This is useful to extract a value that follows a header, either to the left or above a target value. Parameters: Searchfor=word(s) - a case-insensitive word or words that precedes the target value Examples: wordAfter(Searchfor=Purchase Order) - returns a word after or below "Purchase Order".

These filters are designed to manipulate grid data, which is internally stored as a tab-separated-values list with a first row of headers based on the column zone names that the grid contains. Grid filters are designed to retain column relations across rows. They often use the grid object, which provides a large number of methods to manipulate such data.

Name	Description
colTotalNamedCol	Returns the sum of numbers in a given column, defined by name or column number. Parameters: colName=name or number, specifies which grid column to use. Examples: colTotalNamedCol(colName=Extension) colTotalNamedCol(colName={param.ExtCol$})
FieldFromGridRow	Scans a grid column, bottom to top, for a value or a regular expression pattern. If found, get a value from the same row in another column, and place that value in a data field defined in the job. That row is removed from the grid. Note how this differs from a typical filter in that a data field value is updated, rather than simply being returned in out$. Parameters: SearchFor=value or ~regex - what to look for in SearchCol SearchCol=name or number - what column to scan for SearchFor FieldName=name - the data field name in the job definition where a value should be placed FieldCol=name or number - the column that contains the value to place in the data field Examples: FieldFromGridRow(SearchFor=~Freight:,SearchCol=ItemDesc,FieldName=Shipping,FieldCol=Price)
FindGridRow	Scans a grid column for a value or pattern, and returns that row's value from another column. This is useful for extracting data values from a grid, such as freight or tax value, in a data field filter. It is similar to FieldFromGridRow, but is typically used in a data field rather than a grid zone. Parameters: SearchFor=value or ~regex - what to look for in SearchCol SearchCol=name or number - what column to scan for SearchFor ReturnCol=name or number - the column containing the row data to return Examples: FindGridRow(SearchFor=CA Tax,SearchCol=Item,ReturnCol=Extension)
initCol	Initializes a grid column to all empty rows. This can be used to clear a column after another filter has used data in it and it is no longer needed. Parameters: Col=name - the name of the column to initialize
JoinRows	Joins rows together based on a value or pattern in a specific column or column list, by appending row values together in all columns. The logic groups rows together based on a value or pattern in a specific column. An empty row also starts a new group. This works for well structured line detail that has a top row for each line, and stacked data in some columns. When appending, a space separates the line values. Parameters: SearchCol=name or name1,name2... - columns to search SearchFor=value or ~regex - a value or pattern to scan SearchCol for RowDelim=text to place between row values, defaults to a space Examples: JoinRows(SearchCol=LineNo,SearchFor=~\d,RowDelim=\|) - scans the LineNo column for any digit, when found treat that row as the top line, and subsequent lines are appended to this one in each column, until the next LineNo match or an empty line. A pipe (\|) is inserted between each joined row.
KeepPatternRows	Filters the rows in a grid, retaining only those that match a value or pattern in a specified column. This is useful to filter out rows with just item details, often used after first joining rows. Parameters: SearchCol=name - the column to scan SearchFor=value or ~regex - the value or pattern to match in SearchCol - non matching rows are removed Examples: KeepPatternRows(SearchCol=Item,SearchFor=~[A-Z0-9]{4,16} - keep rows with Item column values from 4 to 16 uppercase letters or digits
MapGridCol	Modifies values in a grid column based on a map of original and new values. For example, you can normalize unit of measure values by mapping vendor values to your own values. Parameters: ColName=name - name of column to scan FromList=list - a delimited list of values to convert from ToList=list - a delimited list of corresponding values to convert to Delim=delimiter - the list value separator, which defaults to a comma Examples: MapGridCol(ColName=UOM,FromList="Each,Case,Dozen",ToList="EA,CS,DZ")
RemoveEmptyRows	This removes rows that have no content in any column.
RemoveGridFooter	Scans for the last occurrence of a value or regular expression one or more columns, and removes that row and those below it. Parameters: SearchCols=col1\|col2\|... - an optional pipe-delimited list of column names or numbers, if empty all columns are searched. SearchValue=value or ~regex - a value or regular expression to search for Examples: RemoveGridFooter(SearchCols=Item\|Description,SearchValue=Subtotal)
RemoveGridHeader	Scans for the first occurrence of a value or regular expression one or more columns, and removes that row and those above it. This filter can work at the document level, or it can scan page by page. Parameters: SearchCols=col1\|col2\|... - an optional pipe-delimited list of column names or numbers, if empty all columns are searched. SearchValue=value or ~regex - a value or regular expression to search for ByPage=y - set to y or yes to scan each page for headers to remove Examples: RemoveGridHeader(SearchCols=Item,SearchValue=Part No,ByPage=y)
RemoveGridRows	This filter is deprecated, and only included for supporting existing jobs developed during the v10 beta cycle and still in use.
RemovePageHdrRows	Removes page header rows ([1], [2], etc.) from the grid.
RemovePatternRows	Removes rows where a column matches a value or regular expression pattern. This is the opposite of the KeepPatternRows filter. Parameters: SearchCol=name or number - the column to search SearchFor=value or ~regex - the value or pattern to search for in the specified column Examples: RemovePatternRows(SearchCol=LineNo,SearchFor=~[A-Z][A-Z]) - remove lines with uppercase letters in the LineNo column. RemovePatternRows(SearchCol=Item,SearchFor=Tax) - removes rows with Item values of "Tax".
SplitColumn	Splits values in a column into two separate columns. One column contains the original data, and retains the first portion of the split, and a second column receives the second portion of the split. This can be used in cases where one column contains two distinct pieces of information, such as a quantity and unit of measure. Parameters: FirstCol=colname - the column that contains the data to split, and retains the first portion of the split data SecondCol=colname - the column to place the second portion of the split data Splitter=number or character - defines how the split is performed Splitter Options: •Positive number, the first portion is this many characters from the start of the data, the second portion is the remaining characters •Negative number, the second portion is this many characters from the end of the data, the first portion is remaining characters •~^regex - a regular expression anchored to the front of the string - the first portion is the matched characters, the second portion is the remaining characters •~regex$ - a regular expression anchored to the end of the string - the second portion is the matched characters, the first portion is the remaining characters •~regex - an unanchored regular expression - the second portion is the matched characters, the first portion is the original value with the matched portion removed. If there is a parenthesized group, that group's value is used as the second portion and the entire match is removed to form the first portion •delimiter - a string delimiter splits the value on that delimiter, and the first portion is the data before the delimiter, and the second portion is the data after the delimiter Examples: SplitColumn(FirstCol=Qty,SecondCol=UOM,Splitter=-2) would split "123EA" into 123 and EA, by using the last 2 characters as the second portion. SplitColumn(FirstCol=Qty,SecondCol=UOM,Splitter=~^\d+) would split "123EA" into 123 and EA, by matching one or more digits at the start of the string to be the first portion. SplitColumn(FirstCol=Qty,SecondCol=UOM,Splitter=~[A-Z]+$) would split "123EA" into 123 and EA, by matching one or more uppercase letters at the end of the string to be the second portion. SplitColumn(FirstCol=Qty,SecondCol=UOM,Splitter=/) would split "123/EA" into 123 and EA, using "/" as a delimiter.
StackedLineCleanup	This filter is deprecated, and only included for supporting existing jobs developed during the v10 beta cycle and still in use.