Standard Filters

Top  Previous  Next

Standard filters are provided with every UnForm installation, available for use in Image Manager job definitions.  The tables below describe the use of each of these standard filters.  Using the script editing tool, it is possible to copy these filters to the filters.custom.ini file, and use the filter's code as the basis for a new custom filter.

 

All filters are designed to convert an input string to an output string.  The output string is used in place of the original string when the filter is applied.  The code of the filter must use two variables: in$ and out$, that contain the input and output data.  The in$ variable is pre-populated when the filter code starts to run, and it is the code's job to create the out$ variable with the desired result.  The out$ filter always starts as null ("") when the filter code is run.

 

Parameters

Many filters use parameters, which are case-sensitive named values, passed inside parentheses as name=value pairs.  Parameter values can be literal values or expressions enclosed in curly braces.  Often an expression will refer to a job parameter, like {param.taxheader$}, or another zone or data field.  If a parameter value contains a comma or parenthesis, it should be quoted.

 

Chained Filters

In job design, filters can be chained together to achieve an end result, such as removing headers, removing footers, and joining rows.  When chained, the output from one filter is used as the input for the next filter in the zone or field definition.

 

Page Headers

Many filters are applied to zone data, either ocr or barcode, which is text lifted off one or more pages in a specific position.  Generally this data will have multiple lines, and page headers, so often filters are designed to get rid on this extra information.  Page headers look like a line with this structure: [page].  Many filters remove these page header rows as part of their logic, using one of these functions:

 

iszonepagehdr(x$) returns true (1) if a text value, normally a single line, is a page header (the whole value is "[n]")
 

removezonepagehdrs(x$) removes all page headers from the string passed to it.  There is also a filter named RemovePageHeaderRows that executes this function in a filter context.

 

 

Text Filters

These filters are designed to work with single- or multi-line text, where the text is not structured into columns.  Examples are document identifiers, name and address blocks, subtotals and totals, or dates.  They differ from grid data, which is composed of rows of discrete column data.  These filters can also work on grid column zones, but to retain column relations across rows, avoid filters that remove headers or lines in grid column zones.

 

Name

Removes Headers

Description

anyText

Yes

Removes page headers and converts line breaks to spaces, so all words are on one line.

concat

No

Concatenates two values, with an optional delimiter.  The values and delimiter are passed as parameters.

 

Parameters:

val1=value

val2=value

dlm=delimiter

 

Examples:

concat(val1=Customer,val2={field.CustID$},dlm=": ")

 

dateColumn

No

Formats dates in a multi-line string, returning a string with the same number of lines, with each line either a formatted date or null.  This is useful in a grid column with date values.

 

Parameters:

DateOrder=mdy|dmy|ymd - what order are the date segments, for date parsing purposes.

Format=format - the format the dates should be in the output, using YYYY, YY, MM, or DD placeholders.

Alpha=y|n - if set to y or yes, month names are first converted to month numbers before the parse.

 

Examples:

dateColumn(DateOrder=mdy,Format=YYYY-MM-DD,Alpha=n)

 

 

extractPattern

Yes

Returns data from a string that matches a regular expression pattern, or a simplified pattern.  If the regex contains a parenthesized group, the group is returned.  UnForm supports Perl compatible regular expressions (PCRE)

 

Parameters:

regex=regular-expression (may contain one parenthesized group)

pattern=pattern

 

The pattern is designed to be simpler to use, but offers less flexibility than a regular expression.  Patterns honor the following special matching characters:

A matches any uppercase letter

a matches any lowercase letter

X matches any letter

# matches a digit

* matches any number of characters

? matches any single character

[list] matches the list of characters (i.e. [+-] matches either plus or minus

\ escapes the next character so it is interpreted as a literal (\A matches only "A")

Other characters match themselves

 

 

Examples:

extractPattern(regex=[A-Z]{1,3}-\d\d\d\d) returns three upper-case letters, a hyphen, and 4 digits, if they are present in the input.

 

extractPattern(regex="Customer: (\d+)") returns a customer number, if it is one or more digits following a text string "Customer: ".

 

extractPattern(pattern=AA-####) matches two uppercase letters, a dash, and four digits.

 

extractPattern(pattern=\AX-*-#) matches an A followed by any letter, a dash, any number of characters, another dash, and a digit.

 

firstLine

Yes

Returns the first line in a zone that is not empty or a page header.

firstWord

Yes

Returns the first word from the first line in a zone that is not empty or page header line.

forceUndetectedDecimal

No

Scans lines looking for values that do not have a decimal.  Numeric values that do not have a decimal point are assumed to be a misread and the value is adjusted by scaling it for decimals.  For example, a value of "199 00" would become "199.00".

 

Parameters:

decimals=n - the number of decimal places expected in column values.

 

lastLine

Yes

Returns the last line that is not empty or a page header

lastWord

Yes

Returns the last word from the last line that is not empty or a page header

MapValue

No

Modifies values in a multi-line string based on a map of original and new values.  For example, you can normalize unit of measure values by mapping vendor values to your own values.

 

Parameters:

FromList=list - a delimited list of values to convert from

ToList=list - a delimited list of corresponding values to convert to

Delim=delimiter - the list value separator, which defaults to a comma

 

Examples:

MapValue(FromList="Each,Case,Dozen",ToList="EA,CS,DZ")

 

MapValue(FromList={param.VendorUOM$},ToList={param.OurUOM$})

 

numColDecimals

No

Reformats lines of numbers to ensure they have a certain decimal precision.  Non-number lines become empty lines.  Optionally provide a format for the numbers, to enable more control.

 

Parameters:

decimals=number - the number of decimal places numbers must have

format=mask - a formatting mask, using #,0,-,. characters

 

 

Examples:

numColDecimals(decimals=3) - a row with "1,123.24" would become "1123.240"

numColDecimals(decimals=,format="-###,###.000") - note the quotes to protect the format comma

 

The mask can contain any text, plus the following placeholder characters: 0=zero filled digit, #=space filled digit,  "."=decimal point, ","=thousands separator, -, (, ), and CR for negative numbers.

 

OCRLettersToNumbers

No

Replaces common character misreads in what should be numeric data.

parseDelimitedDate

Yes

Converts a delimited date value to a new format.

 

Parameters:

DateOrder=mdy|dmy|ymd - what order are the date segments, for date parsing purposes.

Format=format - the format the dates should be in the output, using YYYY, YY, MM, or DD placeholders.

Alpha=y|n - if set to y or yes, month names are first converted to month numbers before the parse.

 

Examples:

parseDelimitedDate(DateOrder=mdy,Format=YYYY-MM-DD,Alpha=n)

 

RemoveChars

Yes

Removes a set of characters from each line.

 

Parameters:

CharList=chars or $hexstr$ - a sequence of characters or a hexadecimal string, each character of which is removed from each line

 

Examples:

RemoveChars(CharList="-,.#") - removes hyphens, commas, periods, and hash signs, using a quoted string to protect the comma from being parsed.

 

RemoveChars(CharList=$0922$) - removes tab characters and quote characters.

 

RemovePageHdrRows

Yes

Removes page header rows ([1], [2], etc.) from the zone.

RemoveSpaces

Yes

Removes spaces from each line in the zone.  This is useful in cases where a value should not have spaces, but font calculations by the OCR engine have inserted spaces.

trimLines

Yes

Removes page headers and empty lines.

UnwrapHyphenWords

No

Looks for lines that end with a hyphen (-) and are followed by a line with data.  Where found, the second line is appended to the first line, and the second line becomes an empty line.  Note the hyphen is not removed, as often it is part of valid data.  To remove it, use RemoveChars.

validChars

Yes

This filter looks for a specific line, then filters the value in that line so it only contains characters for a type of value.  A similar but more flexible filter is ExtractPattern.

 

Parameters:

type=class - defines the set of valid characters

firstlast=first|last|lastallpages - first line, last line of first page, last line of all pages

 

The class type can be one of these:

letters - upper and lowercase letters

alpha - letters and digits

alphanum=letters, digits, spaces, punctuation

alphanum2=uppercase letters, digits, punctuation

numbers=digits, commas, decimal points, minus sign

numbersparen=numbers and parentheses

digits=only digits

datechars=digits, forward slash, hyphen

datecharsalpha=digits, letters, comma, forward slash, space, and hypen

 

After character filtering, if type=datacharsalpha, month names are converted to digit equivalents.

 

 

VerticalTotalZoneMatch

Yes

Scans lines for a label or pattern match, and returns a value found to the right of that label or pattern.  This can be used to extract a specific value, such as a total, freight, or tax amount, from a zone.  It can look for a single value, or it can sum multiple values.

 

Parameters:

label=string or string1|string2... a label or ~regex pattern, or a pipe-separated list of them

textmode=y to return a textual value instead of a numeric one

 

The label value can contain multiple items separated with a pipe (|).  Each of these can be either a text string or a ~regex pattern.  A text string can contain spaces to look for more than one word preceding the target value.

 

The textmode=y is only supported for single label values.

 

Examples:

VerticalTotalZoneMatch(label=~\w+ Tax:) returns a numeric value to the right of a line that contains word characters, a space, and the word "Tax:"

 

VerticalTotalZoneMatch(label=CA Tax|LA Tax) returns a sum of values to the right of "CA Tax" and "LA Tax".

 

VerticalTotalZoneMatch(label=Carrier:,textmode=y) returns any text to the right of Carrier:.

 

 

wordAfter

Yes

Returns a word after a specified word or words.  The input text is first flattened to not have any line breaks, and to only have single spacing.  This is useful to extract a value that follows a header, either to the left or above a target value.

 

Parameters:

Searchfor=word(s) - a case-insensitive word or words that precedes the target value

 

Examples:

wordAfter(Searchfor=Purchase Order) - returns a word after or below "Purchase Order".

 

 

 

 

Grid Filters

These filters are designed to manipulate grid data, which is internally stored as a tab-separated-values list with a first row of headers based on the column zone names that the grid contains.  Grid filters are designed to retain column relations across rows.  They often use the grid object, which provides a large number of methods to manipulate such data.

 

Name

Description

colTotalNamedCol

Returns the sum of numbers in a given column, defined by name or column number.

 

Parameters:

colName=name or number, specifies which grid column to use.

 

Examples:

colTotalNamedCol(colName=Extension)

 

colTotalNamedCol(colName={param.ExtCol$})

 

 

FieldFromGridRow

Scans a grid column, bottom to top, for a value or a regular expression pattern.  If found, get a value from the same row in another column, and place that value in a data field defined in the job.  That row is removed from the grid.  Note how this differs from a typical filter in that a data field value is updated, rather than simply being returned in out$.

 

Parameters:

SearchFor=value or ~regex - what to look for in SearchCol

SearchCol=name or number - what column to scan for SearchFor

FieldName=name - the data field name in the job definition where a value should be placed

FieldCol=name or number - the column that contains the value to place in the data field

 

Examples:

FieldFromGridRow(SearchFor=~Freight:,SearchCol=ItemDesc,FieldName=Shipping,FieldCol=Price)

 

FindGridRow

Scans a grid column for a value or pattern, and returns that row's value from another column.  This is useful for extracting data values from a grid, such as freight or tax value, in a data field filter.  It is similar to FieldFromGridRow, but is typically used in a data field rather than a grid zone.

 

Parameters:

SearchFor=value or ~regex - what to look for in SearchCol

SearchCol=name or number - what column to scan for SearchFor

ReturnCol=name or number - the column containing the row data to return

 

Examples:

FindGridRow(SearchFor=CA Tax,SearchCol=Item,ReturnCol=Extension)

 

 

initCol

Initializes a grid column to all empty rows.  This can be used to clear a column after another filter has used data in it and it is no longer needed.

 

Parameters:

Col=name - the name of the column to initialize

 

JoinRows

Joins rows together based on a value or pattern in a specific column or column list, by appending row values together in all columns.  The logic groups rows together based on a value or pattern in a specific column.  An empty row also starts a new group.  This works for well structured line detail that has a top row for each line, and stacked data in some columns.  When appending, a space separates the line values.

 

Parameters:

SearchCol=name or name1,name2... - columns to search

SearchFor=value or ~regex - a value or pattern to scan SearchCol for

RowDelim=text to place between row values, defaults to a space

 

Examples:

JoinRows(SearchCol=LineNo,SearchFor=~\d,RowDelim=|) - scans the LineNo column for any digit, when found treat that row as the top line, and subsequent lines are appended to this one in each column, until the next LineNo match or an empty line.  A pipe (|) is inserted between each joined row.

KeepPatternRows

Filters the rows in a grid, retaining only those that match a value or pattern in a specified column.  This is useful to filter out rows with just item details, often used after first joining rows.

 

Parameters:

SearchCol=name - the column to scan

SearchFor=value or ~regex - the value or pattern to match in SearchCol - non matching rows are removed

 

Examples:

KeepPatternRows(SearchCol=Item,SearchFor=~[A-Z0-9]{4,16} - keep rows with Item column values from 4 to 16 uppercase letters or digits

 

MapGridCol

Modifies values in a grid column based on a map of original and new values.  For example, you can normalize unit of measure values by mapping vendor values to your own values.

 

Parameters:

ColName=name - name of column to scan

FromList=list - a delimited list of values to convert from

ToList=list - a delimited list of corresponding values to convert to

Delim=delimiter - the list value separator, which defaults to a comma

 

Examples:

MapGridCol(ColName=UOM,FromList="Each,Case,Dozen",ToList="EA,CS,DZ")

 

RemoveEmptyRows

This removes rows that have no content in any column.

RemoveGridFooter

Scans for the last occurrence of a value or regular expression one or more columns, and removes that row and those below it.

 

Parameters:

SearchCols=col1|col2|... - an optional pipe-delimited list of column names or numbers, if empty all columns are searched.

SearchValue=value or ~regex - a value or regular expression to search for

 

Examples:

RemoveGridFooter(SearchCols=Item|Description,SearchValue=Subtotal)

 

RemoveGridHeader

Scans for the first occurrence of a value or regular expression one or more columns, and removes that row and those above it.  This filter can work at the document level, or it can scan page by page.

 

Parameters:

SearchCols=col1|col2|... - an optional pipe-delimited list of column names or numbers, if empty all columns are searched.

SearchValue=value or ~regex - a value or regular expression to search for

ByPage=y - set to y or yes to scan each page for headers to remove

 

Examples:

RemoveGridHeader(SearchCols=Item,SearchValue=Part No,ByPage=y)

 

RemoveGridRows

This filter is deprecated, and only included for supporting existing jobs developed during the v10 beta cycle and still in use.

RemovePageHdrRows

Removes page header rows ([1], [2], etc.) from the grid.

RemovePatternRows

Removes rows where a column matches a value or regular expression pattern.  This is the opposite of the KeepPatternRows filter.

 

Parameters:

SearchCol=name or number - the column to search

SearchFor=value or ~regex - the value or pattern to search for in the specified column

 

Examples:

RemovePatternRows(SearchCol=LineNo,SearchFor=~[A-Z][A-Z]) - remove lines with uppercase letters in the LineNo column.

 

RemovePatternRows(SearchCol=Item,SearchFor=Tax) - removes rows with Item values of "Tax".

 

SplitColumn

Splits values in a column into two separate columns.  One column contains the original data, and retains the first portion of the split, and a second column receives the second portion of the split.  This can be used in cases where one column contains two distinct pieces of information, such as a quantity and unit of measure.

 

Parameters:

FirstCol=colname - the column that contains the data to split, and retains the first portion of the split data

SecondCol=colname - the column to place the second portion of the split data

Splitter=number or character - defines how the split is performed

 

Splitter Options:

 

Positive number, the first portion is this many characters from the start of the data, the second portion is the remaining characters

Negative number, the second portion is this  many characters from the end of the data, the first portion is remaining characters

~^regex - a regular expression anchored to the front of the string - the first portion is the matched characters, the second portion is the remaining characters

~regex$ - a regular expression anchored to the end of the string - the second portion is the matched characters, the first portion is the remaining characters

~regex - an unanchored regular expression - the second portion is the matched characters, the first portion is the original value with the matched portion removed.  If there is a parenthesized group, that group's value is used as the second portion and the entire match is removed to form the first portion

delimiter - a string delimiter splits the value on that delimiter, and the first portion is the data before the delimiter, and the second portion is the data after the delimiter

 

Examples:

SplitColumn(FirstCol=Qty,SecondCol=UOM,Splitter=-2) would split "123EA" into 123 and EA, by using the last 2 characters as the second portion.

 

SplitColumn(FirstCol=Qty,SecondCol=UOM,Splitter=~^\d+) would split "123EA" into 123 and EA, by matching one or more digits at the start of the string to be the first portion.

 

SplitColumn(FirstCol=Qty,SecondCol=UOM,Splitter=~[A-Z]+$) would split "123EA" into 123 and EA, by matching one or more uppercase letters at the end of the string to be the second portion.

 

SplitColumn(FirstCol=Qty,SecondCol=UOM,Splitter=/) would split "123/EA" into 123 and EA, using "/" as a delimiter.

 

 

 

StackedLineCleanup

This filter is deprecated, and only included for supporting existing jobs developed during the v10 beta cycle and still in use.