Web information Extraction with Lixto: Visual Logic and Expressive Power


Georg Gottlob, Vienna University of Technology (TU Wien)


Intelligent Web agents and Web information agents need structured data from disparate Web sources for decision making. Wrapper technology is used to extract structured data from unstructured or poorly structured Web pages of continually changing content. In this talk I will give a survey of the Lixto approach to Web data extraction. Lixto assists a user in semi-automatically creating wrapper programs by providing a fully visual and interactive user interface. Lixto wrappers are able to extract deeply nested XML data structures from HTML pages. Visual user operations on example pages are directly translated in logical conditions and rules in a declarative logic-based language. Basic features of this system will be demonstrated and theoretical results about its expressive power will be discussed. Time permitting, we will also discuss some more advanced features of the system and some industrial applications. Papers on Lixto are available at www.lixto.com (downloads section). This talk describes joint work with Robert Baumgartner, Sergio Flesca, Marcus Herzog, and Christoph Koch.