Problem Definition

Today, the Internet has not only one language anymore - much efforts have been made to improve HTML, e.g., - mostly to overcome problems caused by malformed markup - XHTML[4], the eXtensible HyperText Markup Language, a redefinition of HTML in XML, has been introduced.

XHTML was a first step, but it was still insufficient. There was general consensus that only a clear separation of content and style will meet the needs of all the new stakeholders and techniques involved - be it automated extraction of information or delivering this information to new devices or users.

The World Wide Web Consortium's (W3C) eXtensible Markup Language
(XML[5] - a strict subset of SGML[6], the Standard Generalized Markup Language) is a meta-language for defining new markup languages. In the last years it has proved to be the right language not only for the definition of standard device languages and formats (e.g., XHTML, WML...), but also for creating all kind of ''self-describing'' data. Furthermore, in combination with XSL ([7]), a language for transforming (XSL/T) and formatting (XSL:FO) objects, we now theoretically achieved the desired separation of meaning and view.

So, with the help of XML and XSL/T it is now possible to create different views of the same underlying content: After applying different stylesheets (e.g., one for XHTML-output, and a WML-version for WAP-browsers) to one and the same XML-data-file, we obtain several ''viewable'' versions of the (XML-)data-document - e.g. an XHTML-version for the web and a WML-page for (older) mobile browsers. This ensures homogeneity, although documents will still be published under different public (HTTP[8]) addresses (e.g. '''' and '''').

For many legacy web pages which have not been created out of an XML-source-file, however, a redesign from scratch (either in the mentioned way or in a ''mobile language, such as WML) seemed to be not feasible. Therefore, the task of converting these documents ''on demand'' became more and more important.

The main idea behind this project was to dynamically transform HTML-pages to whatever format the requesting client (browser) is capable to understand. While we have originally been looking for a new way of delivering web content to wireless devices, i.e., a transparent, flexible and extensible way to transform documents written in HTML to various output-formats (e.g. XHTML or WML), we found a solution that did not only satisfy us regarding our main aim. It has also proved to be suitable for (at least some of) the requirements of many other popular questions in the emerging field of web engineering.

In the course of our work, we defined (in XML Schema[9]) two new (XML-)languages:
The first language's purpose is the inspection of HTTP-requests (what kind of device is used, which page is requested..) and the definition of what (i.e., if or what kind of transformation-rules should be applied to the corresponding HTTP-response from the web server) should be done if a web request meets certain criteria. The other language is used for specifying how to do this transformation. It does not not only offer possibilities to import or implement XSL- and XQuery-scripts, also simpler search-replace- and more complex page-splitting-functionalities are implemented in this language. For ''common'' transformations (e.g. converting an HTML-form to a WML-form, extracting links, and many more) we created several predefined stylesheets, so one may not have to deal with XSL or XQuery by her own.

This layer of abstraction - and the fact that we kept those (configuration-)languages self-explaining and simple but quite ''mighty'' - not only reduces development-efforts of (either online- or realized as browser-plug-in) graphical configuration-user-interfaces, it also provides a ''self-explaining'' way to convert web content on demand.

root 2006-05-22