xmlreader

Top  Previous  Next

xmlreader

o=new("xmlreader"[,filename$|content$][,preparse])

 

The xmlreader class provides XML parsing capabilities to rule set code blocks.  XML is a common file format for transmitting data between applications.  It is a tag-based data format, where tags are used to identify data elements, as well as data attributes.  Data is arranged in hierarchical fashion, where content contains other content, such as a report, which contains customers, which contain invoices, which contain shipping and order details, and so on.  

 

The xmlreader navigates through this hierarchy using slash delimiters, similar to a file system directory, return data for a particular tag sequence.  The data may be atomic data, a complete element, an element attribute, or it may be addition XML fragments that can be further parsed.  For example, if the root element is "documents", and it contains a number of "document" elements, each of which contains an "invno" tag, you can navigate to an invoice number as "/documents/document/invno".  If there are multiple occurrences of a particular element, you can get a count, and retrieve a particular element using a sequence number.  

 

Elements with sequences can be referenced using a sequence number in many methods (referring to the last element tag), or can be referenced with [n] suffixes anywhere in the path chain.  For example, "/documents/document[2]/invno" refers to the second document element's invno tag.

 

Element references can also be designed to match attributes or values, using one of two syntaxes:

[attribute=value]

[<tag>=value]

 

For example, "/documents/document[OrderID=12345]/Address[<Type>=ShipTo]/City" would locate the document with an attribute "OrderID" of "12345", then find the Address element of that document with a tag <Type> whose value was "ShipTo", then locate the Address element's City tag.

 

If a valid filename$ value is supplied, the file is loaded when the object is first instantiated.  Alternatively, the argument can be an XML document string (if the value doesn't exist as a file, it is assumed to be XML content).  At any time, a new file can be loaded or the XML content provided directly as a string.

 

If the preparse option is supplied and true (not 0), the entire document is parsed into a keyed file structured, which is used for direct element retrieval (not for attribute or value searches).  In cases where the document will be fully accessed during its lifetime, this can provide a significant performance improvement, depending on how frequently elements are retrieved, particularly those in deeply nested or large structures.

 

 

Properties

body$ contains the non-header portion of the file, with any comments removed.

content$ is the string representation of the XML data.  When a file is loaded, content$ is filled with its data.

encoding$ contains the character encoding of the file.  The default encoding is utf-8, which is a compressed version of unicode, but files may be encoded using other character sets, such as unicode or iso8859-1.  Since UnForm's normal character set is iso8859-1 (or the 9J symbol set), it is often necessary to specify a target encoding when retrieving values for UnForm printing.

filename$ is the name of the most recently loaded file.

header$ contains the header portion of the file, similar to "<?xml version="1.0"?>".

root$ contains the name of the root element.  All XML documents have a root element that is the parent of all other elements.

version$ contains the XML version number from the header.

 

 

Methods

flatten(arr$[all] [,nocase [,toencoding$]]) fills the associative array arr$[all] with a flattened (one dimensional) version of the xml content, where each key to the array is an element path starting from root, and the associated values are the most nested element values.  Keys use "[index]" suffixes to indicate a 1-based repeating element array.  Attributes are added to arr$ with keys suffixed with "[attr$]".  Values are entity-decoded automatically.

 

If nocase is true (non-0), all keys are forced to lowercase.  In rare cases, this could result in overwriting of keys and lost data, since xml element names and attribute names are case sensitive.

 

If toencoding$ is specified, values are converted to the defined encoding, such as "cp1252" or "8859-1".

 

Note the order of elements in arr$ is undefined.  An index loop will not return results in any particular order.  Typically, loops look for the existence of a key in the array, and exit if the key is not found.  Details about associative arrays can be found in the Basic Syntax chapter.

 

See the example below.

getattr$(element$[,sequence],attrname$) returns the attribute value of the specified element.  If a sequence is provided, the specified occurrence of the element is used.

getattrs$(element$[,sequence]) returns a list of element attributes, which are name-value pairs.  Each pair is delimited by a linefeed ($0A$), and the name and value are delimited by a tab ($09$).  If a sequence is provided, the specified occurrence of the element is used.

getchild$(element$[,sequence],child) returns a specified child element (which may itself contain nested child elements).  If the sequence is provided, then it applies to the selection of the parent element, not the child element.  The child argument is used to determine which child element to return.  getchild$("/documents/document",5,1) returns the 1st child element of the 5th documents/document element.

getchildname$(element$[,sequence],child) returns a specified child element name.  If the sequence is provided, then it applies to the selection of the parent element, not the child element.  The child argument is used to determine which child element to return.  getchild$("/documents/document",5,1) returns the 1st child element of the 5th documents/document element.

 

Note this method was added in 9.0.21 to assist with updates due to a former bug in the getchild$() method, which was returning a name, rather than the correct, and documented, full element.  If you relied on getchild$() to return just a name, use getchildname$() instead.

getchildren$(element$[,sequence]) returns a linefeed ($0A$) delimited list of element names that are children of the element provided.  If there are no children, then null is returned, and the value of the element is an atomic data value. If a sequence is provided, the specified occurrence of the element is used.

getchildren(element$[,sequence]) returns the number of child elements that are found under the element. If a sequence is provided, the specified occurrence of the element is used.  The sequence applies to the element, not the children.  getchildren("/documents/document",5) returns the number of child elements found under the 5th document element under the documents root element.

getcount(element$) returns the number of times a given element occurs.  This can be used to determine how many sequences of a particular element are available.

getelement$(element$[,sequence]) returns element and enclosed content specified by element$.  If a sequence is provided, the specified occurrence of the element is used.

getns$(element$[,sequence],uri$) scans the attributes of an element for an XML namespace declaration that matches the uri$ provided.  It returns the namespace name, which can then be used to access elements that require a namespace prefix that associates a name with a URI.

getvalue$(element$[,sequence[,toencoding$[,entitydecode]]]) returns the inner value of the specified element.  If the element holds atomic data, the result is a string data value.  Note that an element may contain sub-elements.  If a sequence is provided, the specified occurrence of the element is used.  If toencoding$ is provided, then the value is translated from the XML document's encoding to the specified encoding.  A common toencoding$ value might be "iso8859-1" or "9j" to translate UnForm's default 8-bit text encoding.

 

If entitydecode is provided and true (non-0), the value is entity-decoded.

loadfile(filename$) sets the XML content by reading filename$

parseattribute$(element$,name$) returns the entity-decoded value of the named attribute from the element string.  If the attribute doesn't exist, null is returned.

parseelement(element$,tag$,value$,attributes$) parses the element string into components and fills the tag$, value$, and attributes$ values with the element's tag name, value, and attribute string, which is a linefeed-delimited list of tab-delimited name-value pairs as found in the element's main tag. The attribute values are entity-decoded.

 

The value string can by empty, a text value, or a child element.  Returns 1 on success, 0 on failure.  The value is not entity-decoded, to retain child element encoding.

preparse() parses the current XML content into an internal table for faster element retrieval, at the cost of the initial parsing time, roughly linear to the number of elements in the entire document.

setcontent(content$) sets the XML content to content$.

 

 

Example of flatten method

 

Given this xml:

 

<Rows>

        <Row id="row1">

                <Col>Val 1.1</Col>

                <Col>Val 1.2</Col>

        </Row>

        <Row id="row2">

                <Col>Val 2.1</Col>

                <Col>Val 2.2</Col>

                <Col>Val 2.3</Col>

        </Row>

</Rows>

 

These are the keys and values of the array, sorted for clarity:

 

Rows/Row[1][id]=row1

Rows/Row[1]/Col[1]=Val 1.1

Rows/Row[1]/Col[2]=Val 1.2

Rows/Row[2][id]=row2

Rows/Row[2]/Col[1]=Val 2.1

Rows/Row[2]/Col[2]=Val 2.2

Rows/Row[2]/Col[3]=Val 2.3