HONG KONG BAPTIST UNIVERSITY
FACULTY OF SCIENCE

Department of Computer Science Colloquium
2015 Series

Graph Sampling Algorithms and Their Applications in Online Social Network Analysis

Prof. Jianguo Lu
Professor, School of Computer Science
University of Windsor
Canada

Date: June 24, 2015 (Wednesday)
Time: 4:00 - 5:00 pm
Venue: SCT501, Cha Chi Ming Science Tower, Ho Sin Hang Campus

Abstract
Data, especially online deep web data such as online social networks, are often large and hidden behind searchable interfaces. Discovering the properties and patterns of such hidden data sources is an important and challenging problem. Many data sources like online social networks can be modeled as graphs. In the setting of graph sampling, I will introduce various sampling methods and their performances for estimating aggregate as well as structural properties of a graph. In particular, I will show several interesting results: 1) The norm of sampling practice is to use uniform random samples whenever possible. We prove that, in the contrary, PPS (Probability Proportional to Size) sampling outperforms uniform random sampling for scale-free networks, especially when the data is big. 2) For the multiple capture-recapture sampling, we demonstrate the existence of a bias in a traditional method, correct the bias analytically, and verify the result empirically.

I will also introduce several applications of sampling in online social network analysis. 1) We find that some follower numbers are artificially inflated using a star sampling method that is developed for this purpose; 2) We detect millions of spammers by identifying near-duplicates using a sampling method. In particular, we analyze and compare Weibo and Twitter user networks, highlight their structural differences. Of particular interests is that Hong Kong has the lowest mutual information with most provinces in China, even weaker than Taiwan, coinciding with its tenuous relationship with mainland China.

Biography
Jianguo Lu is a professor in Computer Science at the University of Windsor in Canada. He obtained his Bachelor, Master, and Ph.D. degrees in computer science from Nanjing University. Before joining University of Windsor in 2002, he worked at the University of Toronto as a research associate, at Carnegie Mellon University and University of Tokyo as visiting scientist. In recent years, he works in graph sampling, deep web sampling and crawling, and online social network analysis. His recent publications are in journals such as Information Retrieval (IR), Information Processing and Management (IPM), IEEE Transactions in Knowledge and Data Engineering (TKDE), and Data and Knowledge Engineering (DKE).

********* ALL INTERESTED ARE WELCOME ***********
(For enquiry, please contact Computer Science Department at 3411 2385)

http://www.comp.hkbu.edu.hk/v1/?page=seminars&id=338
Photos  Slides