SNP Function Portal Data Integration Project
 
 

Overview of SNP Function Portal Data Integration

Overview of SNP Function Portal Data Integration

 
          

Since the SNP Function Portal invovles SNP function annotation data from several dozen public data sources as well as multiple in-house data annotation procedures, updating the SNP Function Portal is a complex and time-consuming process. We are now adopting a novel approach of using the make utility to manage the dependency in data download, processing, integration processes. Our solution also enables the easy addition of new SNP function annotations to the SNP Function Portal.

        The current SNP Function Portal Make file incorporates the download and processing of dbSNP data is downloadable here. It uses the GNU General Public License. 
       Another example of Makefile to download, validate and upload cel files from the GEO database is here.

 
 

Basic Procedure :

1. Checking table dependency using Make

Our solution is to create a tag file for each database table. In the following example, tmp_entrezg is a database table, its creation time is the same as file table/tmp_entrezg


table/tmp_entrezg : table/tmp_entreze

echo -e "drop table $(@F);\ncreate table $(@F) as select * from $(<F) where datatype='GENE';" | sqlplus -s $(TNS) | grep -q "Table created."

touch $@

 

2. Use Make to manage jobs submitted to a computer cluster

In the following example, files align/*.jobid depends on db_pass25m1/*.jobid, we use "-W depend" in qsub to make sure the align job is hold until its preresquisite job is done successfully. After Make submitted the alignment job to cluster, it would try to load the alignment result into database immediately, But the first command "while qstat `cat $<` 2> /dev/null; do sleep 60; done" makes sure that it would wait until the alignment job is done.

table/raw_align_% : align/%.jobid

while qstat `cat $<` 2> /dev/null; do sleep 60; done

echo -e "drop table $(@F);\ncreate table $(@F) (chip varchar2(60), x number(4), y number(4), chr varchar2(20), loc number(10), strand varchar2(1), mis_loc number(2), mis_on_chip varchar2(1), mis_on_chr varchar2(1));" | sqlplus -s $(TNS)

echo -e "load data append into table $(@F) fields terminated by '\t'trailing nullcols (strand, chip, x, y, chr, loc, mis_loc, mis_on_chip, mis_on_chr)" > $@.ctl

sqlldr $(TNS) silent=header,feedback,errors,discards,partitions direct=y errors=0 control=$@ data=$@.dat

touch $@

align/%.jobid : db_pass25m1/%.jobid

echo "pass align genome.pass $(@<)/$*.pass > $(@D)/$*.dat" | $(QSUB) -W depend=afterok:`cat $<` > $@