Tutorials |
Hands-On Time-Series Analysis with Matlab |
Michail Vlachos and Spiros Papadimitriou |
IBM T.J. Watson Research Center, Hawthorne, NY, 10532 |
Predictive Learning on Data Streams |
Haixun Wang 1, Ying Yang 2 |
1 IBM T. J. Watson 2 Monash University |
Filtering of Multi-Lingual Terrorist Content with Graph-Theoretic Classification Tools |
Dr. Mark Last 1, Prof. Abraham Kandel 2, Alex Markov 3, Dror Magal 4 |
1 Ben-Gurion University 2 NIACI 3 BGU 4 Meged |
Data Mining for Social Network Analysis |
Jaideep Srivastava, Nishith Pathak and Sandeep Mane |
University of Minnesota |
Abstracts |
Hands-On Time-Series Analysis with Matlab
Michail Vlachos and Spiros Papadimitriou
IBM T.J. Watson Research Center, Hawthorne, NY, 10532
Abstract
Time-series are probably the most prevalent form of data storage and representation in most scientific fields. Examples include industrial or environmental measurements, medical monitoring, stock market analysis, etc. However, in order to efficiently search and explore the ever-increasing amount of collected data, one needs to deploy intelligent techniques for data compression/representation, data organization/pruning and similarity characterization. This tutorial will provide a unified, geometric view of data representation techniques. Furthermore, it will demonstrate how the above tasks can be performed within the environment of the Matlab programming language and software tool, which is easily accessible in many academic institutions.
Predictive Learning on Data Streams
Haixun Wang 1, Ying Yang 2
1 IBM T. J. Watson,
2 Monash University
Abstract
Many applications deal with data of changing characteristics. One distinguishing trait setting streaming data apart from disk-stored data is that streaming data usually exhibits time-changing data characteristics. As decision making relies on the up-to-dateness of its supporting data, the evolving nature of the data creates tremendous complexity for many mining algorithms. Thus, how to make predictive learning more effective and efficient in view of changing data characteristics has become a major challenge in a wide range of application domains, including network monitoring, biosurveillance, Webdata mining, etc.
Filtering of Multi-Lingual Terrorist Content with Graph-Theoretic Classification Tools
Dr. Mark Last 1, Prof. Abraham Kandel 2, Alex Markov 3, Dror Magal 4
1 Ben-Gurion University,
2 NIACI,
3 BGU,
4 Meged
Abstract
The military pressure put on the al-Qaeda leadership in Afganistan after 9/11 has dramatically increased the role of the internet in the infrastructure of the global terrorist organizations (Corera, 2004). Beyond propaganda and ideology, jihadist sites seem to be heavily used for practical training in kidnapping, explosive preparation, and other "core" terrorist activities, which were once taught in Afghan training camps. The former US Deputy Defense Secretary Paul D. Wolfowitz, in a testimony before the House Armed Services Committee, called such Web sites "cyber sanctuaries" (Lipton and Lichtblau, 2004). Of course, al-Qaeda is not the sole source of terror-related websites. According to a recent estimate, the total number of such websites has increased from only 12 in 1997 to around 4,300 in 2005 (Talbot, 2005).
This is why the automated data and text mining approach is so important for the cyber war against international terror.
In this tutorial, we will review the application of several data and text mining techniques to terrorist detection and monitoring on the web. The specific areas covered will include web content mining, trend discovery, web information agents, and activity monitoring.
Data Mining for Social Network Analysis
Jaideep Srivastava, Nishith Pathak and Sandeep Mane
University of Minnesota
Abstract
A social network is defined as a social structure of individuals, who are related (directly or indirectly to each other) based on a common relation of interest, e.g. friendship, trust, etc. Social network analysis is the study of social networks to understand their structure and behavior. Social network analysis has gained prominence due to its use in different applications - from product marketing (e.g. viral marketing) to search engines and organizational dynamics (e.g. management). In the last year there has been a rapid increase in interest regarding social network analysis in the data mining community. The basic motivation is the demand to exploit knowledge from copious amounts of data collected, pertaining to social behavior of users in online environments. A prime example of this are the research efforts dedicated towards the Enron email dataset. Data mining based techniques are proving to be useful for analysis of social network data, especially for large datasets that cannot be handled by traditional methods.
This tutorial will provide an up-to-date introduction to the increasingly important field of data mining in social network analysis, as well as an overview of research directions in this field. It will cover the most representative research activities and directions in data mining based social network analysis techniques. In this tutorial, we first provide an introduction to social network analysis and then survey the research in this field from the perspective of three different disciplines - namely social sciences, computer science and physics. Next, an overview of emerging research in data mining for social network analysis is presented. This tutorial will help researchers by providing a survey on the research till date, enable the understanding of how data mining can be useful for social network analysis and motivate them to pursue new research in this field. It will also be useful for practitioners from industrial organizations to understand how data mining based techniques can help them harness the power of social networks.