Transformation Rules

Figure 3.6 shows the transformation and adaptation rules that are applied to the GMX web page whenever a WAP-enabled client accesses it. A rule group called gmx.mobile has been defined (see line 32) and it specifies two rules: removeEntities and extractLoginFormAndMenu. Note that the removeEntities transformation rule (lines 2-4) contains a $<$searchFor$>$ and a $<$replaceWith$>$ element. These elements provide text-based search-and-replace functionality that allows the usage of regular expressions. In our example, we need it to cut out XML-entities (&..;) that cannot be processed by the Saxon XQuery processor.

The transformation rule called extractLoginFormAndMenus (line 5) is defined by an $<$xquery$>$ element. This element has one optional attribute named preTransform (line 6). By setting it to true (which is the default value), the input data is then considered to be not well-formed XML (i.e., HTML) and will be converted to (well-formed) XHTML content (i.e., a process often called tidying) before the XQuery stylesheet is applied. As we did not want to implement the XQuery script directly in the rule-body, we use the import element (i.e., $<$import$>$) which allows the specification of an external XQuery stylesheet file - i.e., gmx2html-table.xql, line 7) to be imported. The complete listing of this script can be found in the appendix (see figure A.3). It is similar to the script used for content-adaption for mobile clients, but it does not only deliver the login form. In addition, it shows some ''important'' menus and produces HTML-code instead of WML-output.

The extractLoginForm-transformation-rule is also implemented as XQuery-script. This time, the xquery element has one child named script (e.g., lines 13-28). This element indicates that the XQuery script is implemented directly in the rule database The XQuery-code (lines 15-26) is implemented quite straightforward: First an HTML-form with the name ''login'' is extracted. Then it is wrapped into a WML card element and finally presented as WML-document ($<$wml$>$$<$!DOCTYPE wml...). Because WAP browsers do check the Content-Type header field and will produce an error message whenever HTML-content is detected, it is required to change the value of this field to indicate that WML-content is delivered (i.e., text/vnd.wap.wml, line 11). Figures 3.5 and 3.4 show screenshots of the transformations as seen on a traditional browser and a WAP phone. Suppose that the information being extracted from the web page is large and needs to be split over a number of smaller pages. In this case, the splitting elements $<$foxy:group$>$ and $<$foxy:subgroup$>$ are ''inserted'' into the extracted content by means of XQuery or XSL instructions. The $<$layoutPage$>$-element can then be used within the rule implementations to browse between the resulting page splits. Note that in web sites which use a common layout (i.e., corporate identity), FOXY is especially effective because the same HTTP-request pattern and transformation rules can be applied to a large number of web pages.

Figure 3.4: The result of the transformation as seen on a traditional PC browser (i.e., transformation gmx.browser)
Image gmx_browser

root 2006-05-22