Automated Information Extraction

FOXY can be used for the extraction of interesting parts of web pages. In the future, it could be combined with a (specialized) crawler and/or a post-processor to fulfill automated information extraction tasks. The crawler, for example, may request all web pages through the FOXY proxy server. FOXY could then be configured to extract all (and only) forms of all web pages requested through it to reduce the parsing and extraction efforts of a possible post-processor, which purpose is the interpretation of the gained information. The configuration of the crawler, FOXY, and the post-processor may be combined in a graphical user interface (see section 4.4.1).

root 2006-05-22