|
Home > Service > Custom CDF > Description
|
Description of Customized CDF Files
1. Reasons for generating custom CDF files for Affymetrix GeneChips
2. Procedures for generating custom CDF files
3. Statistics of Affymetrix and custom CDF files
4. Known shortcomings of custom CDF files
5. Effects of custom CDF files on the detection of differential expression
6. How to use the custom CDF files
7. Bulletin board for comments and suggestions
1. Reasons for generating custom CDF files for Affymetrix GeneChips
2. Procedures for generating custom CDF files
After probe sequences are BLASTed against the
latest UniGene Build and genome sequence, a series of filtering and
grouping criteria are applied for different CDF files. 2.1. UniGene CDF files: They are based on UniGene clustering and genome sequence. They are closest close to Affymetrix annotation in terms of gene definition. · A probe must have perfect match (hit) to both cDNA/EST sequences and genome sequence. · A probe must only hit one UniGene cluster and one genomic location · All probes representing the same gene must align sequentially in the same direction within the same genomic region · Each probe set must contain at least three oligonucleotide probes and probes in a set are ordered according to their genomic location. 2.2. CDF files for Reference sequence, Entrez Gene and Exon, ENSEMBL Gene, Transcript and Exon and VEGA Gene, Transcript and Exon · A probe must hit only one genomic location · Probes that can be mapped to the same target sequence in the correct direction are grouped together in the same probe set. · Each probe set must contain at least three oligonucleotide probes and probes in a set are ordered according to their location in the corresponding exon. 2.3. Chimpanzee CDFsfinitions. · Affymetrix-Chimp: Probes in Affymetrix CDF but not presented on Chimpanzee genome are eliminated from the corresponding human Affymtrix CDF. · Human UniGene-Chimp: Probes in human UniGene-based CDF but with no hit or with more than one hit on the chimpanzee genomes are eliminated. 3. Statistics of Affymetrix and custom CDF files4. Known shortcomings of custom CDF files4.1. Probesets in these custom CDF files contain from 3 to several dozen probes. The within-chip error is very different for different probe set. 4.2. UniGene CDF files: While our criteria ensure the purity of the redefined probe set based on the available information, we may throw away some good probes since large UniGene clusters may contain small percentage of sequences from other genes due to the presence of chimeric clones or significant homologous sequences. 4.3. ENSEMBL exon CDF files: There are still significant overlap and redundancy in ENSEMBL exon definition. Exons represented by the same exon can be identified from the probe-exon query function on our website. 4.4. ENSEMBL transcript CDF files: Although ENSEMBL probably provides the most extensive and clear transcript definition in the public domain, it may not include all known transcripts due to issues such as database synchronization. Probe targeting at different region of transcripts may not behave the same way. 5. Effects of custom CDF files on the detection of differential expression
6. How to use the custom CDF fileCustom CDF files can be selected based on species, Affymetrix GeneChip type, CDF file type and CDF file format on our CDF download webpage.
6.1. Affymetrix MAS5 and dCHIP The ASCII format CDF is for Affymetrix MAS5 and standalone dCHIP analysis. After unzip the ASCII CDF package, the custom CDF file can be used exactly the same way as Affymetrix CDF files. Please note that dCHIP only accept Affymetrix CDF names thus one has to change the name of the CDF file to the corresponding Affymetrix name. 6.2. BioConductor The R packages for Win32/LINUX are for using GeneChip analysis functions in BioConductor in the corresponding platforms. Since version 11, Bioconductor redirect request to our own repository http://brainarray.mbni.med.umich.edu/bioc. Or you can modify file $R_HOME/etc/repositories to add our repository. A. Use custom CDF files Version 8 in the R-environment (For Bionconductor 1.9). Our Version 8 of custom CDF is included Bioconductor 1.9's repository, it would be downloaded and installed automatically just like affymetrix's original cdf packages. What you need to do is to replace AffyBatch object's cdfName with Custom CDF name. For example,
library(affy) Please note that the version number is already removed from cdfName since version 8. B. Use custom CDF files Version 7 in the R-environment (For Bionconductor 1.8). Our Version 7 of custom CDF is included Bioconductor 1.8's repository, it would be downloaded and installed automatically just like affymetrix's original cdf packages. What you need to do is to replace AffyBatch object's cdfName. For example,
library(affy) C. Download and installed custom CDF files in the R-environment (For all older Bionconductor versions). Under Linux/Unix, use command "R CMD INSTALL ?.tar.gz". Under Windows, select menu "Packages->Install package(s) from local zip files". In order to use the custom CDF files in data analysis after installation, a single line of R command should be added to replace the default Affymetrix CDF file. The following are some examples for different chip and custom probe set combinations:
data<-ReadAffy(); 6.3. The probe mapping file matches individual probes in the custom CDF file and the corresponding Affymetrix CDF file. 6.4. The grouping file can be used to find all targets (exons, transcripts) represented by the same probe set. It also contains the probe set spanning range on genome or transcripts to facilitate RT-PCR primer design. 6.5. The best acc file contains the best nucleic acid accession numbers in the corresponding UniGene databases. Basically the most reliable (Refseq>cDNA>EST) short sequence with maximum probe match count is selected to represent a probe set. Affymetrix’s “Representative Public ID” are also updated and our choice of accession numbers have more probe hit than the original acc under most situations. 6.6. The structure of custom probe set name is “database entry ID_at”. New probe set name can be linked to their corresponding UniGene and ENSEMBL entries using “Batch query custom probe sets identity” function at http://brainarray.mhri.med.umich.edu/Brainarray/Database/CustomCDF/genomic_curated_CDF.asp . 6.7. The effect of various custom CDF files can be tested on cel files deposited in NCBI GEO and EBI ArrayExpress through the “GeneChip Analysis using Custom CDF files” function at http://brainarray.mhri.med.umich.edu/Brainarray/Database/CustomCDF/genomic_curated_CDF.asp .
7. Bulletin board for comments and suggestions
|
|
Questions, Comments and Problems? Discuss at our forum!
Problem with this website? Email us at daimh@umich.edu
İMicroArray Lab |