Downloading and installing
The library was downloaded from http://www.open-mpi.org/software/ompi/v1.6/, compiled with the option --with-sge and installed in'/commun/data/packages/openmpi'
../configure --prefix=/commun/data/packages/openmpi --with-sge make make install
Configuring SGE to run MPI-based program
As described in http://docs.oracle.com/cd/E19080-01/n1.grid.eng6/817-5677/6ml49n2c0/index.html .$ su $ source /commun/sge_root/bird/common/settings.csh $ setenv ARCH lx24-amd64 $ qconf -ap orteAnd added:
pe_name orte slots 448 user_lists NONE xuser_lists NONE start_proc_args /bin/true stop_proc_args /bin/true allocation_rule $round_robin control_slaves TRUE job_is_first_task TRUE urgency_slots min accounting_summary FALSEAdd orte to the parallele env list:
qconf -mattr queue pe_list "orte" all.qUsing qconf -mconf I've also set shell_start_mode to 'unix_behavior'.
Algorithm
In the 'main' method, test if this is the master(==0) or a slave process(!=0):MPI_Comm_rank(MPI_COMM_WORLD, &rank); if (rank == 0) { master(argc-1,&argv[1]); } else { slave(argc-1,&argv[1]); }
MASTER
The master gets the number of available proc:MPI_Comm_size(MPI_COMM_WORLD, &proc_count)Loop over the number of available processors and the BAMS, for each BAM, create a fixed size structure containing the path to the BAM as well as the count of reads initialized to '0':
typedef struct Params_t { char filename[FILENAME_MAX]; int is_error; long count_total; long count_mapped; long count_properly_paired; long count_dup; }Params;This structure is sent to the slaves:
MPI_Send( ¶m, /* message buffer */ sizeof(Params), /* one data item */ MPI_CHAR, /* data item is an array of bytes */ rank, /* destination process rank */ TAG_COUNT_BAM, /* user chosen message tag */ MPI_COMM_WORLD /* default communicator */ );and we wait for the slaves to return those 'Params' with the filled values.:
MPI_Recv(¶m, /* message buffer */ sizeof(Params), /* one data item */ MPI_CHAR, /* data item is an array of bytes */ MPI_ANY_SOURCE, /* receive from any sender */ MPI_ANY_TAG, /* any type of message */ MPI_COMM_WORLD, /* default communicator */ &status);And the result is printed:
printf("%s\t%ld\t%ld\t%ld\t%ld\n", param.filename, param.count_total, param.count_mapped, param.count_properly_paired, param.count_dup );At the end of the master method, when no more BAM is available, the slaves are released (tag is 'TAG_RELEASE_SLAVE') with :
MPI_Send( &empty, /* message buffer */ sizeof(Params), /* one data item */ MPI_CHAR, /* data item is an array of bytes */ rank, /* destination process rank */ TAG_RELEASE_SLAVE, /* SHutdown */ MPI_COMM_WORLD ); /* default communicator */
SLAVE
The slave method receives some messages from the master:MPI_Recv( ¶m, sizeof(Params), MPI_CHAR, 0, MPI_ANY_TAG, MPI_COMM_WORLD, &status);If the message was 'TAG_RELEASE_SLAVE' we exit the method. Else the BAM file is scanned using the SAmtools API for C.
count_reads(¶m);and the result is sent back to the master:
MPI_Send( ¶m, sizeof(Params), MPI_CHAR, 0, 0, MPI_COMM_WORLD );.
Running the Job on SGE
I still don't understand why I need to write the following line:# qsh -pe orte 4 Your job 84581 ("INTERACTIVE") has been submitted waiting for interactive job to be scheduled ... Your interactive job 84581 has been successfully scheduled.Run the analysis:
qrsh -cwd -pe orte 100 /commun/data/packages/openmpi/bin/mpirun \ /path/to/ompibam \ `find /commun/data/projects/folder1234/align -name "*.bam"`qstat -f shows the jobs running:
arch states --------------------------------------------------------------------------------- all.q@node01 BIP 0/15/64 2.76 lx24-amd64 84598 0.55500 mpirun lindenb r 02/11/2013 12:03:36 15 --------------------------------------------------------------------------------- all.q@node02 BIP 0/14/64 3.89 lx24-amd64 84598 0.55500 mpirun lindenb r 02/11/2013 12:03:36 14 --------------------------------------------------------------------------------- all.q@node03 BIP 0/14/64 3.23 lx24-amd64 84598 0.55500 mpirun lindenb r 02/11/2013 12:03:36 14 --------------------------------------------------------------------------------- all.q@node04 BIP 0/14/64 3.68 lx24-amd64 84598 0.55500 mpirun lindenb r 02/11/2013 12:03:36 14 --------------------------------------------------------------------------------- all.q@node05 BIP 0/15/64 2.91 lx24-amd64 84598 0.55500 mpirun lindenb r 02/11/2013 12:03:36 15 --------------------------------------------------------------------------------- all.q@node06 BIP 0/14/64 3.91 lx24-amd64 84598 0.55500 mpirun lindenb r 02/11/2013 12:03:36 14 --------------------------------------------------------------------------------- all.q@node07 BIP 0/14/64 3.79 lx24-amd64 84598 0.55500 mpirun lindenb r 02/11/2013 12:03:36 14output:
#filename total mapped properly_paired dup /commun/data/projects/folder1234/11X0589_markdup.bam 1179766 973943 684794 0 /commun/data/projects/folder1234/11X0589_pair1_sorted.bam 1179766 973943 684794 0 /commun/data/projects/folder1234/11X0589_realigned.bam 1179766 973943 684794 0 /commun/data/projects/folder1234/11X0589_recal.bam 1179766 973943 684794 0 /commun/data/projects/folder1234/11X0602_pair1_sorted.bam 982006 815636 566136 0 /commun/data/projects/folder1234/11X0602_realigned.bam 982006 815636 566136 0 /commun/data/projects/folder1234/11X0602_markdup.bam 982006 815636 566136 0 /commun/data/projects/folder1234/11X0602_pair1_unsorted.bam 982006 815636 566136 0 /commun/data/projects/folder1234/11X0375_pair1_sorted.bam 1255772 1031155 735480 0 /commun/data/projects/folder1234/11X0375_realigned.bam 1255772 1031155 735480 0 /commun/data/projects/folder1234/11X0375_markdup.bam 1255772 1031155 735480 0 /commun/data/projects/folder1234/11X0367_pair1_sorted.bam 1422144 1158315 796466 0 /commun/data/projects/folder1234/11X0367_markdup.bam 1422144 1158315 796466 0 /commun/data/projects/folder1234/11X0375_pair1_unsorted.bam 1255772 1031155 735480 0 /commun/data/projects/folder1234/11X0367_realigned.bam 1422144 1158315 796466 0 /commun/data/projects/folder1234/11X0291_pair1_sorted.bam 1296294 1067233 743924 0 /commun/data/projects/folder1234/11X0291_markdup.bam 1296294 1067233 743924 0 /commun/data/projects/folder1234/11X0291_realigned.bam 1296294 1067233 743924 0 /commun/data/projects/folder1234/11X0311_markdup.bam 1353534 1117463 788474 0 /commun/data/projects/folder1234/11X0311_realigned.bam 1353534 1117463 788474 0 /commun/data/projects/folder1234/11X0375_recal.bam 1255772 1031155 735480 0 /commun/data/projects/folder1234/11X0589_pair1_unsorted.bam 1179766 973943 684794 0 /commun/data/projects/folder1234/11X0367_pair1_unsorted.bam 1422144 1158315 796466 0 /commun/data/projects/folder1234/11X0290_pair1_sorted.bam 1436970 1191044 837726 0 /commun/data/projects/folder1234/11X0290_realigned.bam 1436970 1191044 837726 0 /commun/data/projects/folder1234/11X0291_pair1_unsorted.bam 1296294 1067233 743924 0 /commun/data/projects/folder1234/11X0311_pair1_sorted.bam 1353534 1117463 788474 0 /commun/data/projects/folder1234/11X0240_pair1_sorted.bam 1489672 1229654 830042 0 /commun/data/projects/folder1234/11X0240_realigned.bam 1489672 1229654 830042 0 /commun/data/projects/folder1234/11X0240_markdup.bam 1489672 1229654 830042 0 /commun/data/projects/folder1234/XX10272_pair1_sorted.bam 1579608 1296449 876040 0 /commun/data/projects/folder1234/XX10272_markdup.bam 1579608 1296449 876040 0 /commun/data/projects/folder1234/XX10272_realigned.bam 1579608 1296449 876040 0 /commun/data/projects/folder1234/11X0602_recal.bam 982006 815636 566136 0 /commun/data/projects/folder1234/11X0290_markdup.bam 1436970 1191044 837726 0 /commun/data/projects/folder1234/11X0290_pair1_unsorted.bam 1436970 1191044 837726 0 /commun/data/projects/folder1234/11X0247_pair1_sorted.bam 1489832 1233208 862842 0 /commun/data/projects/folder1234/11X0456_pair1_sorted.bam 1486490 1220969 848738 0 /commun/data/projects/folder1234/11X0456_realigned.bam 1486490 1220969 848738 0 /commun/data/projects/folder1234/11X0247_markdup.bam 1489832 1233208 862842 0 /commun/data/projects/folder1234/11X0247_realigned.bam 1489832 1233208 862842 0 /commun/data/projects/folder1234/11X0456_markdup.bam 1486490 1220969 848738 0 /commun/data/projects/folder1234/11X0367_recal.bam 1422144 1158315 796466 0 /commun/data/projects/folder1234/11X0305_pair1_sorted.bam 1818236 1489267 1027458 0 /commun/data/projects/folder1234/11X0305_markdup.bam 1818236 1489267 1027458 0 /commun/data/projects/folder1234/11X0305_realigned.bam 1818236 1489267 1027458 0 /commun/data/projects/folder1234/11X0542_pair1_sorted.bam 1784860 1469183 1016314 0 /commun/data/projects/folder1234/XX07551_recal.bam 1778358 1456518 1018694 0 /commun/data/projects/folder1234/11X0542_realigned.bam 1784860 1469183 1016314 0 /commun/data/projects/folder1234/11X0542_markdup.bam 1784860 1469183 1016314 0 /commun/data/projects/folder1234/11X0311_pair1_unsorted.bam 1353534 1117463 788474 0 /commun/data/projects/folder1234/11X0291_recal.bam 1296294 1067233 743924 0 /commun/data/projects/folder1234/XX07551_pair1_sorted.bam 1778358 1456518 1018694 0 /commun/data/projects/folder1234/11X0240_pair1_unsorted.bam 1489672 1229654 830042 0 /commun/data/projects/folder1234/XX10272_pair1_unsorted.bam 1579608 1296449 876040 0 /commun/data/projects/folder1234/XX07551_markdup.bam 1778358 1456518 1018694 0 /commun/data/projects/folder1234/XX07551_realigned.bam 1778358 1456518 1018694 0 /commun/data/projects/folder1234/11X0305_pair1_unsorted.bam 1818236 1489267 1027458 0 /commun/data/projects/folder1234/11X0311_recal.bam 1353534 1117463 788474 0 /commun/data/projects/folder1234/11X0456_pair1_unsorted.bam 1486490 1220969 848738 0 /commun/data/projects/folder1234/11X0247_pair1_unsorted.bam 1489832 1233208 862842 0 /commun/data/projects/folder1234/11X0542_pair1_unsorted.bam 1784860 1469183 1016314 0 /commun/data/projects/folder1234/XX07551_pair1_unsorted.bam 1778358 1456518 1018694 0 /commun/data/projects/folder1234/11X0433_pair1_sorted.bam 2178798 1800943 1249738 0 /commun/data/projects/folder1234/11X0433_markdup.bam 2178798 1800943 1249738 0 /commun/data/projects/folder1234/XX10272_recal.bam 1579608 1296449 876040 0 /commun/data/projects/folder1234/11X0433_realigned.bam 2178798 1800943 1249738 0 /commun/data/projects/folder1234/11X0240_recal.bam 1489672 1229654 830042 0 /commun/data/projects/folder1234/11X0290_recal.bam 1436970 1191044 837726 0 /commun/data/projects/folder1234/11X0456_recal.bam 1486490 1220969 848738 0 /commun/data/projects/folder1234/11X0247_recal.bam 1489832 1233208 862842 0 /commun/data/projects/folder1234/11X0433_pair1_unsorted.bam 2178798 1800943 1249738 0 /commun/data/projects/folder1234/11X0305_recal.bam 1818236 1489267 1027458 0 /commun/data/projects/folder1234/11X0542_recal.bam 1784860 1469183 1016314 0 /commun/data/projects/folder1234/11X0433_recal.bam 2178798 1800943 1249738 0However I noticed seems that running this code on the cluster is slower that running it on the master node...
The C code
Note: the C code is compiled with ortecc, not gcc.I'm still very new to this API and to those concepts. Feel free to suggest anything :-)
That's it,
Pierre
Hi Pierre,
ReplyDeleteThanks a ton for this post, I'm going to try this out on my cluster here. I have a tool which iterates through a bcf/vcf file and computes a statistic at each locus. I'll let you know if I can parallelize that using OpenMP !
Thanks again,
Avinash