Home > Data Mining  

 

 
 

Data Mining

    The ability of microarray technology to generate data on the expression of thousands of genes at a time has led to an increased need for cross-reference experiment data with previously reported biological facts, theories and results.

    Biomedical literature databases provide knowledge warehouses required for such cross-reference. However, the overwhelming amount of biomedical literature makes such task intimidating.

Analyze Affymetrix CEL files with our powerful customized function
 
The ontology mapping tool will help the understanding of the biological significance of differentially expressed gene lists derived from high throughput experiments. It will map genes to their corresponding Gene Ontology terms and rank the statistical significance of GO term matches based on hypergeometric distribution. The method used here is rather simple, but it's helpful in the sense that it can translate a list of genes into biological meanings.
 
 
On one side, results are pouring in from microarray experiments. One the other side, the volume of literature is growing at unprecedented rate. Medline alone contains more than 12 million citations, making it almost impossible for researchers to keep up with current research in their fields. There is an urgent need for bridging the gap between high-throughput experiments and vast knowledge repositories. Automatic extraction of information from biomedical literatures will thus play a critical role in aiding in research and speeding up discovery process. We are currently interested in identifying and extracting macromolecular entities from biomedical literatures. Our approach will map the identified entities to individual LocusLink entries thus enable the seamless integration of literature information with existing gene and protein databases.
   
Our goal is to develop an effective solution that can facilitate the mining of Medline literature related to genetic studies and gene/protein function studies.
 
 
Due to the lack of consensus tagged corpus in biomedical domain, this page is setup to allow users to evaluate the performance of some components using their own test data, and hence facilitates the evaluation process.
 
 
The Neighborhood Analysis Algorithm is based on the paper published by Vamsi K. Mootha, et al. "Identification of a gene causing human cytochrome C oxidase deficiency by integrative genomics", PNAS, January 21, 2003, vol. 100 no. 2
 
 

 

 

İMicroArray Lab