University of Illinois Department of Statistics

presents
 


ChengXiang Zhai

Department of Computer Science, University of Illinois Urbana-Champaign

"Statistical Language Models for Text Retrieval and Mining"

 

As more and more text information (e.g., Web pages, email messages, and scientific literature) is available online, developing software tools such as Web search engines to help people manage such information becomes increasingly important. A major challenge in text information management is to model the uncertainties associated with making inferences on text data. For example, given that a user is interested in finding articles about heart diseases, how likely will the user use the word "heart" in the query?
Statistical language models (i.e., probabilistic models of text) have been successfully used to address such questions.

In this talk, I will present some of my research work on applying statistical language models to text retrieval and mining. I will first present a decision-theoretic framework for text retrieval and show how this framework naturally allows us to use statistical language models to solve the text retrieval problem. I will then present several specific language models involving multinomial distributions (over words) and mixture of multinomial distributions and show how they can improve retrieval accuracy. Finally, I will present a contextualized multinomial mixture model that can be used to discover and analyze
spatial and temporal patterns of topic themes from text collections. I will show some results of using this model to analyze news articles, scientific literature, and Weblog data.


Thursday, October 19, 2006

4:00 PM

2 Illini Hall

 

< back to 2006-07 Colloquium Schedule

< back to Department of Statistics main page.


| People | Programs | Courses | Consulting | Las | Uiuc |
email comments