Web information Extraction with Lixto: Visual Logic and Expressive Power
Georg Gottlob, Vienna University of Technology (TU Wien)
Web agents and Web information agents need structured data from disparate Web
sources for decision making. Wrapper technology is used to extract structured
data from unstructured or poorly structured Web pages of continually changing
content. In this talk I will give a survey of the Lixto approach to Web data
extraction. Lixto assists a user in semi-automatically creating wrapper programs
by providing a fully visual and interactive user interface. Lixto wrappers are
able to extract deeply nested XML data structures from HTML pages. Visual user
operations on example pages are directly translated in logical conditions and
rules in a declarative logic-based language. Basic features of this system will
be demonstrated and theoretical results about its expressive power will be discussed.
Time permitting, we will also discuss some more advanced features of the system
and some industrial applications. Papers on Lixto are available at www.lixto.com
(downloads section). This talk describes joint work with Robert Baumgartner,
Sergio Flesca, Marcus Herzog, and Christoph Koch.