HONG KONG BAPTIST UNIVERSITY
FACULTY OF SCIENCE

Department of Computer Science Seminar
2009 Series

Quality Information Retrieval from the Internet For Document Ranking, Web Search Engines, and Crawler Designs (Seminar 3 of 4)

Dr. Markus Hagenbuchner
Department of Computer Science
Hong Kong Baptist University

Date: November 9, 2009 (Monday)
Time: 2:30 - 4:00 pm
Venue: SCT909, Cha Chi Ming Science Tower, Ho Sin Hang Campus

Abstract
The Internet is the world’s largest distributed repository of electronic documents. In fact, the Internet is a very large and dynamic source of information which is growing exponentially in size at a rate of approximately 2.68 million new web sites per month. The finding of information on the Internet becomes an ever more difficult task. Internet search engines partially solve this issue by regularly creating an index of relatively large portions of the web. This task requires the retrieval of web pages from the Internet. Given the estimated size of the Internet being 11.5 billion pages and its assumed dynamics it is paramount for any search engine to crawl ever growing sets of web pages either continuously, or at frequent intervals. The consequence is that crawling needs to be done very efficiently.
Moreover, searching through a large repository brings a number of challenges such as:
• How can one find a required document or information? Searching through a distributed system of Web servers is inefficient.
• With the increase of the size of the World Wide Web it becomes more and more likely that multiple documents match a search criterion. How to select the most appropriate documents?

This seminar will introduce to search engines, indexing, and crawling, and will introduce new ways in incorporating cognitive processes into document assessment. The approach allows the automated assessment of information quality. It will be shown that the incorporation of such an approach into information retrieval systems improves the Web search experience when compared to Google and Yahoo.

Dr Markus Hagenbuchner will present the seminar. A case study will be presented by a guest speaker Dr Wei-Tsen Milly Kc from the University of Wollongong, Australia

Biography
Markus Hagenbuchner holds a PhD (Computer Science, University of Wollongong, Australia). He is currently a senior lecturer in the School of Computer Science and Software Engineering at the University of Wollongong, Australia. He joint the machine learning research area in 1992, started to focus his research activities on Neural Networks for the graph structured domain in 1998, and pioneered the development of Self-Organizing Maps for structured data. His contribution to the development of a Self-Organizing Map for graphs led to winning the international competition on document mining on several occasions.

He is a team leader of the machine learning group at the University of Wollongong. He has been the co-chair for the AI-08 conference, and is a program committee member for ANNPR 2010. He has been a reviewer of international standing for the Australian Research Council since 2004, and has been an invited guest speaker at various international venues. His current research interest is on the development of supervised and unsupervised machine learning methods for the processing of complex data structures in data mining applications.

********* ALL INTERESTED ARE WELCOME ***********
(For enquiry, please contact Computer Science Department at 3411 2385)

http://www.comp.hkbu.edu.hk/v1/?page=seminars&id=114&lang=tc
Photos  Slides (Dr. Markus Hagenbuchner)  Slides (Milly Wei-Tsen Kc)