Playing with the C API for sqlite: inserting the hapmap data , my notebook.
In this post, I'll insert a Hapmap genotype file into a SQLite database using the C API for SQLIte. Of course, this is just a test and I would not use SQLite to store this kind of information. Moreover the speed of insertion of SQLite was deadly slow here.
The Genotype file
The first line contains the name of the individuals, the other rows of the file contain one SNP per line and the genotype for each individual:rs6423165 A/G chrY 109805 + ncbi_b36 broad urn:LSID:affymetrix.hapmap.org:Protocol:GenomeWideSNP_6.0:3 urn:LSID:broad.hapmap.org:Assay:SNP_A-8572888:3 urn:lsid:dcc.hapmap.org:Panel:CEPH-60-trios:3 QC+ GG AG GG AG AG AG AA AA GG AA AG NN AG AA AG NN AA AG NN GG GG GG GG AA GG GG AG AG AA AG GG AG AG AG AA AG AG AG GG GG AA AA NN GG AG NN AA AG NN GG NN GG GG GG AA AG AA AA AG GG AG AG GG AG AA AG GG AG GG GG AG GG AG GG GG AG GG GG AA NN AG GG AG GG GG GG AG GG GG AA AA GG GG GG NN AG AG AG GG AG AG AA AA GG AA GG AA AG AA AG GG AG GG GG AG AG GG GG AG AG AG NN AG AG AG AG AG NN AA AA AG AG AG AG AG AA AA AG AG AA AG GG AG GG GG AG AG AG AG AG AG AA AG AG AG AG AA AA AG GG GG GG GG AG AG GG GG AG AA GG AG AG GG AG
rs6608381 C/T chrY 109922 + ncbi_b36 broad urn:LSID:affymetrix.hapmap.org:Protocol:GenomeWideSNP_6.0:3 urn:LSID:broad.hapmap.org:Assay:SNP_A-8528859:3 urn:lsid:dcc.hapmap.org:Panel:CEPH-60-trios:3 QC+ CC CT CC CT CT CT TT TT CC TT CT NN CT TT CT NN TT CC NN CC CC CC CC TT CC CC CT CT TT CT CC CT CT CT TT CT CT CT CC CC TT TT NN CC CT NN TT CT NN CC NN CC CC CC TT CT TT TT CT CC CT CT CC CT TT CT CC CT CC CC CT CC CT CC CC CT CC CC TT NN CT CC CT CC CC CC CT CC CC TT TT CC CC CC NN CT CT CT CC CT CT TT TT CC TT CC TT CT TT CT CC CT CC CC CT CT CC CC CT CT CT NN CT CT CT CT CT NN TT TT CT CT CT CT CT TT TT CT CT TT CT CC CT CC CC CT CT CC CT CT CC TT CT CT CT CT TT TT CT CC CC CC CC CT CT CC CC CT TT CC CT CT CC CT
rs6644972 A/G chrY 118624 + ncbi_b36 sanger urn:LSID:illumina.hapmap.org:Protocol:Human_1M_BeadChip:3 urn:LSID:sanger.hapmap.org:Assay:H1Mrs6644972:3 urn:lsid:dcc.hapmap.org:Panel:CEPH-60-trios:3 QC+ GG GG AG GG AG AG AG GG AG GG AG NN GG GG AG NN GG GG NN AG AG GG GG GG GG GG GG GG GG AG AA GG GG GG AG GG GG GG GG GG GG GG NN GG AG GG GG GG NN AG NN AG AG AA GG GG GG GG GG GG GG GG GG AG GG GG AA AG GG AG GG GG GG GG GG GG GG AG AG NN AG GG AG GG GG GG AG GG GG GG GG AA AA AG NN GG GG GG GG GG GG GG AG GG GG GG GG AG GG GG AG GG GG GG GG GG GG GG GG GG GG AG GG GG GG GG GG NN GG GG AG GG GG AG GG AG GG AG AA GG GG GG GG AG GG GG GG AG GG AG GG GG GG AG GG GG AG GG AG GG GG GG GG GG GG AG GG GG GG GG GG GG GG GG
rs6655397 A/G chrY 141935 + ncbi_b36 sanger urn:LSID:illumina.hapmap.org:Protocol:Human_1M_BeadChip:3 urn:LSID:sanger.hapmap.org:Assay:H1Mrs6655397:3 urn:lsid:dcc.hapmap.org:Panel:CEPH-60-trios:3 QC+ GG GG GG GG AG AG GG GG GG GG GG NN GG GG NN NN GG GG NN GG GG GG GG GG GG GG GG GG GG GG GG GG GG GG GG GG GG GG GG GG GG GG NN AG GG GG GG GG NN AG NN GG GG GG NN AG GG GG GG AG GG GG AG GG GG GG GG AG GG AG GG GG GG GG AG AG GG GG GG NN AG GG GG GG GG GG GG GG GG GG GG GG GG GG NN GG GG GG GG GG GG GG GG GG GG GG GG GG GG GG GG GG GG GG GG GG GG GG GG GG GG NN GG GG AG GG AG NN GG GG GG GG GG GG GG GG AG GG GG AG GG GG GG GG GG GG GG GG GG GG GG GG GG GG GG GG GG GG GG GG GG GG GG GG GG GG GG AG GG GG GG GG GG GG
rs6603172 C/T chrY 142439 + ncbi_b36 broad urn:LSID:affymetrix.hapmap.org:Protocol:GenomeWideSNP_6.0:3 urn:LSID:broad.hapmap.org:Assay:SNP_A-8527413:3 urn:lsid:dcc.hapmap.org:Panel:CEPH-60-trios:3 QC+ TT TT TT TT CT CT TT TT TT TT TT NN TT TT TT NN TT TT NN TT TT TT TT TT TT TT TT TT TT TT TT TT TT TT TT TT TT TT TT TT TT TT NN CT TT TT TT TT NN CT NN TT TT TT TT CT TT TT TT CT TT TT CT TT TT TT TT CT TT CT TT TT TT TT CT CT TT TT TT NN CT TT TT TT TT TT TT TT TT TT TT TT TT TT NN TT TT TT TT TT TT TT TT TT TT TT TT TT TT TT TT TT TT TT TT TT TT TT TT TT TT TT TT TT CT TT CT NN TT TT TT TT TT TT TT TT CT TT TT CT TT TT TT TT TT TT TT TT TT TT TT TT TT TT TT TT TT TT TT TT TT TT TT TT TT TT TT CT TT TT TT TT TT TT
rs6644970 A/G chrY 142664 + ncbi_b36 broad urn:LSID:affymetrix.hapmap.org:Protocol:GenomeWideSNP_6.0:3 urn:LSID:broad.hapmap.org:Assay:SNP_A-4207883:3 urn:lsid:dcc.hapmap.org:Panel:CEPH-60-trios:3 QC+ AG AA AA AG AG GG AG AG AA AA GG NN AG AA GG NN AG AG NN AG AG AG GG AG AG AG AA AG GG GG AA AG AG AG AG AG AA AA AG GG AG GG NN GG AG NN AG AG NN AG NN AA AG AA AA AG GG GG AG AG AG AG AG AG AA AG AA AG GG GG AG AG AA GG AG GG AA AG AG NN AG AG GG AG GG GG AG GG GG AA AG AA AG GG NN GG AA GG AG AG GG GG AA GG AA AG AG GG GG AG AG GG AG AG AG GG AG AG AG AG AG NN AG GG AG AG GG NN AA AG AA AA GG AG AG AG GG AG AG AG GG AG AA AG GG AA AA AG GG AA AG AG GG AA GG AA AA AG AG AG AG AG GG AG GG GG AG AG AG AA AA GG AG AG
rs11019 A/G chrY 159978 + ncbi_B36 affymetrix urn:LSID:affymetrix.hapmap.org:Protocol:GenomeWideSNP_6.0:2 urn:LSID:affymetrix.hapmap.org:Assay:SNP_A-4207841:2 urn:lsid:dcc.hapmap.org:Panel:CEPH-30-trios:1 QC+ NN AG NN NN AG GG AA NN NN AG NN AG GG AA NN AG NN NN GG NN GG AG GG NN NN GG NN GG NN GG AG GG NN NN GG AG NN NN NN AG AG NN NN NN NN AA AG GG GG AA AG GG GG NN NN GG GG AG GG AG AG NN AA AA NN NN NN NN NN NN NN NN NN NN GG AG GG AG AG GG AG AG GG GG NN GG GG AG AG GG GG AG AG AG AG AG GG GG GG NN NN NN NN NN NN NN NN NN NN NN NN NN NN NN NN NN NN NN NN NN NN NN NN NN AG NN AA GG NN NN GG NN NN GG GG AG AA AG GG AA AA NN NN NN NN NN NN AG GG GG AG GG AG NN NN NN NN NN NN NN NN NN GG GG AG AG GG GG NN GG NN NN GG AG
rs1132353 C/T chrY 160116 + ncbi_b36 sanger urn:LSID:illumina.hapmap.org:Protocol:Human_1M_BeadChip:3 urn:LSID:sanger.hapmap.org:Assay:H1Mrs1132353:3 urn:lsid:dcc.hapmap.org:Panel:CEPH-60-trios:3 QC+ CC CT TT CT CT TT NN CT TT CT CC NN TT CC CT NN TT CT NN CT TT CT CT TT TT CC TT CT CC TT CT TT TT TT TT CT TT TT TT CT CT TT NN TT CT NN CT TT NN CC NN TT TT CC NN TT TT CT TT CT CT CC CC CC CT CT CC CT CC CT CT TT TT TT TT CT TT CT CT NN CT CT TT TT TT TT TT CC CT TT TT CT CT CT NN CT TT TT TT TT TT TT TT TT TT TT CT CT CC TT CT TT CT CT CT CT CT CC CC CC CC TT CC CT CT CT CC NN TT CT TT TT CT TT CT CT CC CT TT CC CC TT TT TT CT TT TT CT TT TT CT TT CT TT TT CT TT CT TT CT CT CT TT TT CT CT TT TT CT TT TT CC TT CT
rs35047434 C/T chrY 169118 + ncbi_b36 sanger urn:LSID:illumina.hapmap.org:Protocol:Human_1M_BeadChip:3 urn:LSID:sanger.hapmap.org:Assay:H1Mrs35047434:3 urn:lsid:dcc.hapmap.org:Panel:CEPH-60-trios:3 QC+ TT TT TT TT CT CT TT TT TT TT TT NN TT TT TT NN TT TT NN CT TT TT TT TT TT TT TT TT TT TT TT TT TT CT TT TT TT CT TT TT CT TT NN CT TT TT TT TT NN TT NN TT TT TT TT TT TT TT TT TT CT TT TT TT TT TT TT CT TT TT TT CT TT CT CT TT TT TT TT NN TT TT TT TT TT TT TT TT TT CT TT TT TT TT NN TT CT TT TT TT CT CT TT TT TT CT CT TT TT CT TT TT TT TT TT TT TT TT TT TT TT TT TT TT TT TT TT NN TT CT TT CT TT TT TT TT TT TT TT TT TT TT TT TT TT TT TT TT TT CT TT TT TT TT CT TT TT TT CT TT TT TT TT CT TT TT TT CC TT TT TT TT TT TT
(...)
Opening the SQLIte Database
{
fprintf(stderr,"Cannot open sqlite file %s.\n",env->fileout);
exit(EXIT_FAILURE);
}
Closing the SQLIte Database
Creating a table
"create table markers(id INTEGER PRIMARY KEY,name varchar(20) unique,chrom varchar(10),position integer)",
NULL,NULL,&error
)!=SQLITE_OK)
{
fprintf(stderr,"Cannot create table markers: %s\n",error);
exit(EXIT_FAILURE);
}
Filling a prepared statement and inserting some data
"insert into markers(name,chrom,position) values(?,?,?)",
-1,&pstmt_insert_marker,NULL
)!=SQLITE_OK)
{
fprintf(stderr,"Cannot compile insert into markers.\n");
exit(EXIT_FAILURE);
}
(...)
if(sqlite3_bind_text(
pstmt_insert_marker,1,
name,
strlen(name),
NULL)!=SQLITE_OK)
{
fprintf(stderr,"Cannot bind markers's name.\n");
exit(EXIT_FAILURE);
}
(
if(sqlite3_bind_text(
pstmt_insert_marker,2,
chrom,
strlen(chrom),
NULL)!=SQLITE_OK)
{
fprintf(stderr,"Cannot bind markers's chrom.\n");
exit(EXIT_FAILURE);
}
if( sqlite3_bind_int(
pstmt_insert_marker,3,
genomic_position
)!=SQLITE_OK)
{
fprintf(stderr,"Cannot bind markers's position.\n");
exit(EXIT_FAILURE);
}
if (sqlite3_step(pstmt_insert_marker) != SQLITE_DONE) {
fprintf(stderr,"Could not step insert marker.\n");
exit(EXIT_FAILURE);
}
marker_id = sqlite3_last_insert_rowid(env->connection);
sqlite3_reset(pstmt_insert_marker);
Creating a custom function for sqlite
Here I implemented a custom function named homozygous it returns true if a genotype is a string of two identical characters (but not 'N').sqlite3_context* context, /* context */
int argc, /* number of arguments */
sqlite3_value** argv /* arguments */
)
{
/* one text arg */
if(argc==1 &&
sqlite3_value_type(argv[0])==SQLITE_TEXT)
{
const char *alleles=(const char*)sqlite3_value_text(argv[0]);
if( alleles!=NULL &&
strlen(alleles)==2 &&
alleles[0]==alleles[1] &&
alleles[0]!='N')
{
/* set result to TRUE */
sqlite3_result_int(context,1);
return;
}
}
/* set result to FALSE */
sqlite3_result_int(context,0);
}
Telling sqlite about the new custom function
"homozygous",/* function name */
1,/* number of args */
SQLITE_UTF8,/* encoding */
NULL,
is_homozygous,/* the implementation callback */
NULL,
NULL
);
Implementing a callback for a simple 'SELECT' query
The following callback prints the results to stdout.void* notUsed,
int argc,/* number of args */
char** argv,/* arguments as string */
char** columns /* labels for the columns */
)
{
int i;
/* prints all the columns to stdout*/
for(i=0; i<argc; i++)
{
printf( "\"%s\": \"%s\"\n",
columns[i],
argv[i]!=NULL? argv[i] : "NULL"
);
}
printf("\n");
return 0;
}
Executing a SELECT query
Here, we select all the homoygous genotypes:env->connection, /* An open database */
"select individuals.name,"
"markers.name,"
"markers.chrom,"
"markers.position,"
"genotypes.alleles "
"from "
"genotypes,"
"markers,"
"individuals "
"where "
"markers.id=genotypes.marker_id and "
"individuals.id=genotypes.individual_id and "
"homozygous(genotypes.alleles)=1", /* SQL to be evaluated */
callback_select_homozygous, /* Callback function */
NULL, /* 1st argument to callback */
&errormsg /* Error msg written here */
)!=SQLITE_OK)
{
fprintf(stderr,"Cannot select.\n");
exit(EXIT_FAILURE);
}
The full source code
Excuting the C program
[LOG]inserted marker ID.2
[LOG]inserted marker ID.3
[LOG]inserted marker ID.4
[LOG]inserted marker ID.5
(... WAIT !!! )
"name": "NA12878"
"name": "rs6603251"
"chrom": "chrY"
"position": "240580"
"alleles": "CC"
"name": "NA12890"
"name": "rs6603251"
"chrom": "chrY"
"position": "240580"
"alleles": "CC"
"name": "NA12892"
"name": "rs6603251"
"chrom": "chrY"
"position": "240580"
"alleles": "CC"
That's it
Pierre