The 2006 IEEE / WIC / ACM International Conference on Data Mining (ICDM 2006)

ICDM / WI / IAT Joint Speaker

Service-Oriented Science: Scaling eScience Impact

Prof. Ian Foster

Computation Institute Argonne National Laboratory & University of Chicago

Email:

Webpage: http://www.ci.uchicago.edu/

ICDM Speakers

Exploratory Mining in Cube Space

Prof. Raghu Ramakrishnan

Yahoo! Research

Data mining methods for modeling gene expression regulation and their applications

Prof. Weixiong Zhang

Department of Computer Science and Engineering, Washington University in St. Louis

Website: http://www.cse.wustl.edu/~zhang

Service-Oriented Science: Scaling eScience Impact

Prof. Ian Foster
Director, Computation Institute Argonne National Laboratory & University of Chicago

Abstract

Computational approaches to problem solving have proven their worth in many fields of science, allowing the collection and analysis of unprecedented quantities of data, and the exploration via simulation of previously obscure phenomena. We now face the challenge of scaling the impact of these approaches from the specialist to entire communities. I speak here about work that seeks to address this goal by rethinking science's information technology foundations in terms of service-oriented architecture. In principle, service-oriented approaches can have a transformative effect on scientific communities, allowing tools formerly accessible only to the specialist to be made available to all, and permitting previously manual data-processing and analysis tasks to be automated. However, while the potential of such "service-oriented science" has been demonstrated, its routine application across many disciplines raises challenging technical problems. One important requirement is to enable the convenient discovery and composition of services. Another is to achieve a separation of concerns between discipline-specific content and domain-independent infrastructure. A third is to streamline the formation and evolution of the "virtual organizations" that create and access content. I describe the architectural principles, software, and deployments that I am and my colleagues have produced as we tackle these problems, and point to future technical challenges and scientific opportunities.

Bio Sketch

Ian Foster was born in Wellington, New Zealand. He has an bachelor of science (Hons I) degree in computer science from the University of Canterbury in Christchurch, New Zealand and a doctorate in computer science from Imperial College, London. Foster joined the Mathematics and Computer Science Division of Argonne National Laboratory in 1989. He is now the Director of the Computation Institute at Argonne and the University of Chicago, where he is also the Arthur Holly Compton Distinguished Service Professor of Computer Science. His research deals with distributed, parallel, and data-intensive computing technologies; the applications of those technologies to scientific problems; and the mechanisms and policies needed to create and operate scalable scientific "cyberinfrastructures." He has published six books and over 300 articles and technical reports on these and related topics. Dr. Foster is chair of the Globus Management Committee, which leads development of the Globus Toolkit, the open source software that is widely used for Grid computing in both e-business and e-science. Foster is also Chief Open Source Strategist at Univa Corporation, a company he founded with other Globus leaders to foster and promote commercial applications of Grid technology. Dr. Foster is a fellow of the American Association for the Advancement of Science and the British Computer Society. His awards include the British Computer Society's award for technical innovation, the Global Information Infrastructure (GII) Next Generation award, the British Computer Society's Lovelace Medal, and R&D Magazine's Innovator of the Year, and DSc Honoris Causa from the University of Canterbury.

Exploratory Mining in Cube Space

Prof. Raghu Ramakrishnan
Yahoo! Research

Abstract

Data Mining has evolved as a new discipline at the intersection of several existing areas, including Database Systems, Machine Learning, Optimization, and Statistics. An important question is whether the field has matured to the point where it has originated substantial new problems and techniques that distinguish it from its parent disciplines. In this paper, we discuss a class of new problems and techniques that show great promise for exploratory mining, while synthesizing and generalizing ideas from the parent disciplines. While the class of problems we discuss is broad, there is a common underlying objective-to look beyond a single data mining step (e.g., data summarization or model construction) and address the combined process of data selection and transformation, parameter and algorithm selection, and model construction. The fundamental difficulty lies in the large space of alternative choices at each step, and good solutions must provide a natural framework for managing this complexity. We regard this as a grand challenge for Data Mining, and see the ideas in this paper as promising initial steps towards a rigorous exploratory framework that supports the entire process.

This is joint work with several people, in particular, Beechung Chen.

Bio Sketch

Dr. Ramakrishnan has a long history in the data mining field. His seminal clustering work on BIRCH appeared in the first volume of the Data Mining and Knowledge Discovery journal. While known to most for his long service as Professor of Computer Science at the University of Wisconsin, he has recently established a research group at Yahoo! Labs. His talk ''Exploratory Mining in Cube Space'' looks toward formalizing the grand challenge of a unifying framework to address the complex choices spanning the data mining process. Dr. Ramakrishnan is a fellow of the ACM, and has a Ph.D. from the University of Texas at Austin.

Data mining methods for modeling gene expression regulation and their applications

Prof. Weixiong Zhang
Department of Computer Science and Engineering, Washington University in St. Louis

Abstract

Understanding gene expression regulation at both transcriptional and post-transcriptional levels is critical for elucidation of the mechanism of stress tolerance in plants and important for understanding and diagnosis of human diseases. With the advent of high throughput gene expression profiling techniques, a huge amount of gene expression data on various organisms has been collected. Such a wealth of biological data has provided excellent opportunities to elucidating transcriptional regulation mechanisms using machine learning and data mining approaches.

My main purpose of this talk is to demonstrate how machine learning and data mining methods can be developed and applied to analyzing large quantities of genomic information and gene expression data for characterizing and modeling gene expression regulation. In particular, I will present and discuss some of the methods that we have developed for modeling gene expression regulation underlying abiotic stress (e.g., drought, low temperature and salinity) tolerance, for identifying gene responsive to particular environmental stress conditions, and for characterizing the functions of microRNA genes (which are non-coding RNA genes with ~21 nucleotides long and play important roles in post-transcriptional gene expression regulation) for stress regulation in model plant Arabidopsis thaliana. I will describe machine learning and data mining approaches for feature selection and gene expression modeling that we have developed, including 1) a genome-scale approach for finding cis-regulatory elements (short DNA sequences in promoter regions) which can be used as features for modeling transcription regulation, and a bi-dimensional regression tree method for characterizing gene expression regulation that integrates information of cis-regulatory elements and gene expression data. I will also discuss three applications of these computational approaches to developing 1) what we called targeted gene finding method for identifying stress responsive genes in A. thaliana using cis-regulatory elements, 2) a new method for characterizing core promoters of the currently known microRNA genes in C. elegans (worm), H. sapiens (human), O. sativa (rice) and A. thaliana and for predicting promoters of microRNA genes, and 3) a novel functional annotation method for discovering microRNA genes in A. thaliana that are inducible by abiotic stresses.

Bio Sketch

Professor Weixiong Zhang is an Associate Professor in Computer Science and Genetics at Washington University in St. Louis, Missouri, USA. He received his B.S. and M.S. in computer engineering from Tsinghua University, Beijing, China, and his Ph.D. in computer science from University of California at Los Angeles (UCLA). Professor Zhang's research interests include computational molecular biology and genomics, artificial intelligence (heuristic search, machine learning, constraint optimization, distributed multi-agent systems), data mining, and combinatorial optimization. He has published more than 80 papers in these areas and is the author of a research monograph, State-Space Search: Algorithms, Complexity, Extensions and Applications, published by Springer in 1999.