grid

Top  Previous  Next

gridobj=new("grid")

 

The grid object is designed to manage tab-separated-values data, enabling creation, manipulation, and formatting of the data in discrete cells, columns, and rows.  The object is used extensively by the Image Manager, but is available to other operations and rule set code blocks.  Grids can contain subheaders to represent pages of data, which is used by the Image Manager when working with OCR data from multiple pages.  A subheader is one with a column 1 format of [pagenum].  Some methods work with pages.  When navigating the grid, you can use the 'ispagehdr(row) method to test if a given row is a page header rather than a data row.

 

After instantiation, the data in the grid object is populated by the methods addcol(), addrow(), or parse().  Various methods are supplied to format and retrieve data by column, row, or as a tab-separated-values list.

 

Properties

 

cols contains the number of columns in the grid.  Set this value to quickly modify the number of columns in the grid.

rows contains the number of rows in the grid.  Set this value to quickly modify the number of rows in the grid.

dateorder$ sets the parsing order for date format options in the grid.  This can be mdy, dmy, or ymd.  When cell values are parsed, date segments (delimited by any non-digit character) are assumed to be in this sequence.

 

Methods

 

addcol(col$[,hasheader|header$])) adds a column to the grid, by parsing the col$ string for line-feed delimited values.  If hasheader is true (non-zero), the first row of data is treated as a header.  If header$ is provided, it is used as a header.  If neither hasheader or header$ is provided, the header for the column is null.

addrow(row$) adds a row to the grid, parsing the row$ string for tab-delimited cells.  Columns are added if needed.

deletecol(colnum) removes the column specified.

deletecolcell(col,row) removes the cell specified, shifting other rows in that column up.

deleterow(rownum) removes the row specified.

deleterowcell(col,row) removes the cell specified, shifting other columns in that row to the left.

formatdatecol(col[,order$[,format$]]) formats the column as date values.  If order$ is not provided, it uses the dateorder$ property.  If format$ is not provided, the format used is YYYY-MM-DD.  The format$ string can contain MM, DD, YY, and YYYY values to control the format.  Cells that do not have valid dates are set to null.

formatnumcol(col[,mask$]) formats the column as number values.  If mask$ is provided, it controls the formatting.  Otherwise raw numbers are shown.  Non-number cells are set to null.   Mask characters are replaced with text characters from the number string as follows:

 

# with a digit or space
0 with a digit or "0"
, with a thousands separator
. with a decimal character
- with a "-" if the number is negative, or space if positive, removing spaces between the - and the first digit if placed on the left of the mask
+ with a "+" if the number is positive, - if the number is negative

 

"#,###,##0.00-" will display 12345.67 as "12,345.67 ", or -124 as "124.00-".

formatstrcol(col,mask$) formats text data in a column using a string format mask.  Mask characters are replaced with text characters as follows:

 

0 with a digit
A with a letter, which is converted to uppercase
a with a letter
X with any character, letters converted to uppercase
x with any character
Z with any letter or digit, letters converted to uppercase
z with any letter or digit

 

"A0A-0A0" will convert x9za0b to "X9Z-A0B".

getarray(x${all}) fills x$[all] with the contents of grid.  X$[0:cols,0:rows].  Row 0 contains the column headers.  Column 0 is empty.

getcell$(col,row) returns the value of the specified cell.

getcol$(colnum|colname$[,withhdr]) returns a line-feed delimited list of values from the specified column, given its column number (1 to cols) or column header value as colname$.  If withhdr is true (non-zero), the first value will be the column header.

getcolnum(name$) returns the column number of the column name specified.  Column names are case-insensitive header values.

getpages$() returns a line-feed delimited list of page numbers in the grid.

getpagestext$([pagelist$[,withheader]])) returns all the text of the specified page numbers, or all pages if there is no pagelist$ specified or it is null.  Each page is prefixed with a [pagenum] header row, and if withheader is true (non-zero), each page's rows includes an initial header row.

getpagetext$(page$|page[,withhdr]) returns all text of a single page, specified with a string or number of the page number.  No page header is provided.  A column header row is provided if withhdr is true (non-zero).

getrow$(rownum) returns the tab-delimited cells of the specified row.

gettext$([withhdr[,trim]]) returns all the text of the grid.  If withhdr is true, the first row contains column headers.  If trim is true, trailing empty rows are removed.  There are no page headers included in the response.

initcol(colnum|colname$) sets all the specified column's cells to null.

initrow(rownum) sets all the specified row's cells to null.

insertcol(col$,colnum[,hasheader|header$]) inserts a column at the specified position, and fills the column with line-feed delimited row data from col$.  If hasheader is true, the first row is assumed to be the column header.  If header$ is supplied, it is used as the column header.

insertcolcell(col,row,value$) inserts a value at the cell specified, shifting rows in that column down.

insertrow(row$,rownum) inserts a row at the specified position, and fill is with tab-delimited column data from row$.

insertrowcell(col,row,value$) inserts a value at the cell specified, shifting columns in that row right.

ispagehdr(row) returns 1 if the row is a page header row.

joinrows([col$,pattern$ [,delim$]]) joins rows together by appending cells to the top row of any  row group.  Row groups separated by empty rows, where no cell has any data.  A row group header row can also be separated by a pattern that occurs in a column or columns.  The col$ value can contain one or more column names separated by commas or linefeeds.  A pattern is a literal text phrase, or a regular expression prefixed with ~.  Regular expression patterns are case-insensitive.  For example, to join rows where the first line of any row has a three-digit number in a LineNo column:  g'joinrows("LineNo","~\d{3}").  This function returns the number of row groups found.

 

If delim$ is supplied it separates rows that are joined.  If not supplied or null it defaults to a space.

multiplycols(col1,col2,colresult) multiplies numeric values in the col1 and col2 columns, and fills the colresult column with the result.

parse(txt$[,hasheader]) parses the tab-separated-values text content into the grid, replacing any former contents.  If hasheader is true, the first row of data is used for column headers.

removeemptyrows() removes all rows that have no data.

removepages() removes page headers from the grid.  Page headers are typically supplied by an ocr grid column zone, but removing them can make some grid data processing easier.

removerowsabove(search$[,includefoundrow]) searches rows (top down search) for a match, and if found, removes rows above the matching line.  If includefoundrow is true (non-0), the line found is also removed.  The search$ string can be a simple text string, in which case all columns are scanned.  It can contain a @collist suffix, where collist is a comma-separated list of column names or numbers to search.  Also, if the string begins with ~, then the search is considered a case-insensitive regular expression rather than a simple text value.  Use \@ and \~ to override these interpretations, and \\ to represent a backslash anywhere in the search string.

removerowsbelow(search$[,includefoundrow]) searches rows (bottom up search) for a match, and if found, removes rows below the matching line.  If includefoundrow is true (non-0), the line found is also removed.  The search$ string can be a simple text string, in which case all columns are scanned.  It can contain a @collist suffix, where collist is a comma-separated list of column names or numbers to search.  Also, if the string begins with ~, then the search is considered a case-insensitive regular expression rather than a simple text value.  Use \@ and \~ to override these interpretations, and \\ to represent a backslash anywhere in the search string.

 

Example: g'removerowsbelow("Total:@Price",1) would search the Price column for the string "Total:".  If a matching row is found, the rows below, and the matching row, will be removed.

setcell(col,row,value$) sets the value of the specified cell.

splitcol(source|source$,target|target$,cols) splits values in a source column into two values, and places the two values in the source and target columns.  If cols is positive, the first value is the first cols characters, the second is the remaining characters.  If cols is negative, the second value is the last cols characters, and the first value is remaining characters.

 

Source and target columns can be specified by column number or column name (both must be the same type).

splitcol(source|source$,target|target$,pattern$) splits values in a source column into two values, and places the two values in the source and target columns. If pattern$ stars with ~, the balance is a case-insensitve regular expression; otherwise, pattern$ is treated as a delimiter value.

 

If the regular expression is anchored to the start of the string (^pattern), the first value is the matching string, the second value is the remaining characters.  If it is anchored to the end of the string (pattern$), the first value is the string up to the match, the second value is the matching string.  if no anchor is used, the second value is the matching string, the first value is the string with the matched value removed.  If the pattern contains a parenthesized segment, the sub-match is used as the second value, and the full match is removed from the first value.

 

If a delimiter is specified, the first value is the characters before the delimiter, and the second value is the characters after the delimiter.

 

Source and target columns can be specified by column number or column name (both must be the same type).

subtractcols(col1,col2,colresult) subtracts numeric values in col2 from col1, and fills the colresult column with the result.

sumcol(colnum) returns the sum of all numeric values in the specified column.

sumcols(col1,col2[,col3],colresult) adds numeric values in col1 and col2, and the optional col3, and fills the colresult column with the result.

unstackcols(collist$[,targetcol|targetcol$][,pagebreaks]) modifies the specified columns, a linefeed- or comma-delimited list of column numbers or names.  For each row, if any cell in the column list is null, the non-null cells in that row are appended (with a space) to the previous non-empty row values for that cell. This method can be used to accumulate stacked description columns by using a control column and the description column.  Whenever the control column is null, the description column gets appended to the previous non-null description.

 

If a target column name or number is provided, then the values are appended to the target column rather than the collist$ column in which they are bound.  This can be used to split a stacked column into a two columns, one for the top row and the other for the rows below.

 

If pagebreaks is true (non-0), any page break resets the last non-empty row, so that each page is treated as an independent entity.  If false or not supplied, then continuation lines may be appended to lines of the previous page.  This enables stacked columns to cross page boundaries.