Showing posts with label pedigree. Show all posts
Showing posts with label pedigree. Show all posts

23 April 2010

Short post: Plain text vs Binary data.

This week, I was asked the difference between storing some data using a plain text format and a binary format. I wrote the following code C++ to illustrate how to store some genotypes in a ( possibly huge )table saved as a binary file. The first bytes of this file tells the number of individuals, the number of markers, the name of the individuals. Then for each marker, we get the name of the marker, its position and an array of genotypes for each individual.

The first invocation of this program (-p write ) creates a random table and a second call (-p read ) answers the genotypes for some random individuals/markers.



That's it.

Pierre

01 September 2009

Using the BerkeleyDB Direct Persistence Layer: my notebook

In this post I show how I've used the Java BerkeleyDB API / Direct Persistence Layer to store a set of individuals in a BerkeleyDB database.


In a prevous post, I've shown how to use the BerkeleyDB API, a key/value database, to store some RDF statements. In this example, a set of TupleBinding was created to read and write the java Objects from/to the BerkeleyDB database.
Via Oracle: The Direct Persistence Layer (DPL) is one of two APIs that BerkeleyDB provides for interaction with databases. The DPL provides the ability to cause any Java type to be persistent without implementing special interfaces. The only real requirement is that each persistent class have a default constructor. No hand-coding of bindings is required. A binding is a way of transforming data types into a format which can be stored in a JE database. No external schema is required to define primary and secondary index keys. Java annotations are used to define all metadata.

OK, say you want to store a set of individuals in a BerkeleyDB database. The class Individual will be annotated with the @Entity annotation to tell the DPL that it should save this class. The primary key will be annotated with @PrimaryKey and will be automatically filled by the BerkeleyDB engine.
@Entity //Indicates a persistent entity class.
public class Individual
{
@PrimaryKey(sequence="individual") //Indicates the primary key field of an entity class
private long id;
(...)
}
. We also want to have a quick access to the family names, to the fathers and to the mothers. A @SecondaryKey is used to create those secondary indexes. Those secondary indexes also act as a constraint: references to the parents are allowed only if their ID already exist in the database.
@Entity
public class Individual
{
@PrimaryKey(sequence="individual")
private long id;
private String firstName=null;
@SecondaryKey(relate=Relationship.MANY_TO_ONE)
private String lastName=null;
@SecondaryKey(relate=Relationship.MANY_TO_ONE,
relatedEntity=Individual.class,
onRelatedEntityDelete=DeleteAction.NULLIFY
)
private Long fatherId=null;
@SecondaryKey(relate=Relationship.MANY_TO_ONE,
relatedEntity=Individual.class,
onRelatedEntityDelete=DeleteAction.NULLIFY)
private Long motherId=null;
(...)
}

At the end, here is the full source code of the class Individual.
package dpl;
import com.sleepycat.persist.model.DeleteAction;
import com.sleepycat.persist.model.Entity;
import com.sleepycat.persist.model.PrimaryKey;
import com.sleepycat.persist.model.Relationship;
import com.sleepycat.persist.model.SecondaryKey;

@Entity
public class Individual
{
@PrimaryKey(sequence="individual")
private long id;
private String firstName=null;
@SecondaryKey(relate=Relationship.MANY_TO_ONE)
private String lastName=null;
@SecondaryKey(relate=Relationship.MANY_TO_ONE,
relatedEntity=Individual.class,
onRelatedEntityDelete=DeleteAction.NULLIFY
)
private Long fatherId=null;
@SecondaryKey(relate=Relationship.MANY_TO_ONE,
relatedEntity=Individual.class,
onRelatedEntityDelete=DeleteAction.NULLIFY)
private Long motherId=null;
private int gender=0;

public Individual()
{

}

public Individual(String firstName,String lastName,int gender)
{
this.firstName=firstName;
this.lastName=lastName;
this.gender=gender;
}

public long getId() {
return id;
}



public String getFirstName() {
return firstName;
}
public void setFirstName(String firstName) {
this.firstName = firstName;
}
public String getLastName() {
return lastName;
}
public void setLastName(String lastName) {
this.lastName = lastName;
}

public long getFatherId() {
return fatherId;
}
public void setFatherId(long fatherId) {
this.fatherId = fatherId;
}
public long getMotherId() {
return motherId;
}
public void setMotherId(long motherId) {
this.motherId = motherId;
}

public void setGender(int gender) {
this.gender = gender;
}
public int getGender() {
return gender;
}

@Override
public int hashCode() {
return 31 + (int) (id ^ (id >>> 32));
}

@Override
public boolean equals(Object obj) {
if (this == obj)
return true;
if (obj == null)
return false;
if (!(obj instanceof Individual))
return false;
Individual other = (Individual) obj;
if (id != other.id)
return false;
return true;
}

@Override
public String toString() {
return getFirstName()+" "+getLastName();
}
}

Opening the Database


The database, the datastore, and the indexes are opened:
EnvironmentConfig EnvironmentConfig envCfg= new EnvironmentConfig();
StoreConfig storeCfg= new StoreConfig();
envCfg.setAllowCreate(true);
envCfg.setTransactional(true);
storeCfg.setAllowCreate(true);
storeCfg.setTransactional(true);
this.environment= new Environment(dataDirectory,envCfg);
this.store= new EntityStore(this.environment,"StoreName",storeCfg);
this.individualById = this.store.getPrimaryIndex(Long.class, Individual.class);
this.individualByLastName= this.store.getSecondaryIndex(this.individualById, String.class, "lastName");

Creating a few Individuals


A transaction is opened, some individuals of Charles Darwin's family are inserted in the datastore and the transaction is commited.
Transaction txn;
//create a transaction
txn= environment.beginTransaction(null, null);

Individual gp1= new Individual("Robert","Darwin",1);
individualById.put(gp1);
Individual gm1= new Individual("Susannah","Wedgwood",2);
individualById.put(gm1);
Individual gp2= new Individual("Josiah","Wedgwood",1);
individualById.put(gp2);
Individual gm2= new Individual("Elisabeth","Allen",2);
individualById.put(gm2);

Individual father= new Individual("Charles","Darwin",1);
father.setFatherId(gp1.getId());
father.setMotherId(gm1.getId());
individualById.put(father);
Individual mother= new Individual("Emma","Wedgwood",2);
mother.setFatherId(gp2.getId());
mother.setMotherId(gm2.getId());
individualById.put(mother);


Individual c1= new Individual("William","Darwin",1);
c1.setFatherId(father.getId());
c1.setMotherId(mother.getId());
individualById.put(c1);
Individual c2= new Individual("Anne Elisabeth","Darwin",2);
c2.setFatherId(father.getId());
c2.setMotherId(mother.getId());
individualById.put(c2);

txn.commit();

Using the secondary indexes


An EntityCursor obtained from the secondary index individualByLastName is used to iterate over all the individuals named "Darwin":
EntityCursor<Individual> cursor = individualByLastName.entities("Darwin", true, "Darwin", true);
for(Individual indi:cursor)
{
System.out.println(indi.getLastName()+"\t"+indi.getFirstName()+"\t"+indi.getId());
}
cursor.close();


Output

###Listing all Darwin
Darwin Robert 1
Darwin Charles 5
Darwin William 7
Darwin Anne Elisabeth 8


Source code



package dpl;

import java.io.File;
import java.util.logging.Logger;

import com.sleepycat.je.DatabaseException;
import com.sleepycat.je.Environment;
import com.sleepycat.je.EnvironmentConfig;
import com.sleepycat.je.Transaction;
import com.sleepycat.persist.EntityCursor;
import com.sleepycat.persist.EntityStore;
import com.sleepycat.persist.PrimaryIndex;
import com.sleepycat.persist.SecondaryIndex;
import com.sleepycat.persist.StoreConfig;

public class DirectPersistenceLayerTest
{
private static Logger LOG= Logger.getLogger(DirectPersistenceLayerTest.class.getName());
private Environment environment=null;
private EntityStore store;
private PrimaryIndex<Long, Individual> individualById;
private SecondaryIndex<String, Long, Individual> individualByLastName;


public void open(File dir) throws DatabaseException
{
close();
EnvironmentConfig envCfg= new EnvironmentConfig();
StoreConfig storeCfg= new StoreConfig();
envCfg.setAllowCreate(true);
envCfg.setTransactional(true);
storeCfg.setAllowCreate(true);
storeCfg.setTransactional(true);
LOG.info("opening "+dir);
this.environment= new Environment(dir,envCfg);
this.store= new EntityStore(this.environment,"StoreName",storeCfg);
this.individualById = this.store.getPrimaryIndex(Long.class, Individual.class);
this.individualByLastName= this.store.getSecondaryIndex(this.individualById, String.class, "lastName");
}

public void close()
{
if(this.store!=null)
{
LOG.info("close store");
try {
this.store.close();
}
catch (DatabaseException e)
{
LOG.warning(e.getMessage());
}
this.store=null;
}

if(this.environment!=null)
{
LOG.info("close env");
try {
this.environment.cleanLog();
this.environment.close();
}
catch (DatabaseException e)
{
LOG.warning(e.getMessage());
}
this.environment=null;
}
}

void run() throws DatabaseException
{
Transaction txn;
LOG.info("count.individuals="+ individualById.count());
//create a transaction
txn= environment.beginTransaction(null, null);

Individual gp1= new Individual("Robert","Darwin",1);
individualById.put(gp1);
Individual gm1= new Individual("Susannah","Wedgwood",2);
individualById.put(gm1);
Individual gp2= new Individual("Josiah","Wedgwood",1);
individualById.put(gp2);
Individual gm2= new Individual("Elisabeth","Allen",2);
individualById.put(gm2);

Individual father= new Individual("Charles","Darwin",1);
father.setFatherId(gp1.getId());
father.setMotherId(gm1.getId());
individualById.put(father);
Individual mother= new Individual("Emma","Wedgwood",2);
mother.setFatherId(gp2.getId());
mother.setMotherId(gm2.getId());
individualById.put(mother);


Individual c1= new Individual("William","Darwin",1);
c1.setFatherId(father.getId());
c1.setMotherId(mother.getId());
individualById.put(c1);
Individual c2= new Individual("Anne Elisabeth","Darwin",2);
c2.setFatherId(father.getId());
c2.setMotherId(mother.getId());
individualById.put(c2);

txn.commit();



System.out.println("###Listing all Darwin");
EntityCursor<Individual> cursor = individualByLastName.entities("Darwin", true, "Darwin", true);
for(Individual indi:cursor)
{
System.out.println(indi.getLastName()+"\t"+indi.getFirstName()+"\t"+indi.getId());
}
cursor.close();

LOG.info("count.individuals="+individualById.count());
}

public static void main(String[] args)
{
DirectPersistenceLayerTest app= new DirectPersistenceLayerTest();
try
{
int optind=0;
while(optind< args.length)
{
if(args[optind].equals("-h"))
{
System.err.println("");
}
else if(args[optind].equals("--"))
{
optind++;
break;
}
else if(args[optind].startsWith("-"))
{
System.err.println("Unknown option "+args[optind]);
}
else
{
break;
}
++optind;
}
app.open(new File("/tmp/bdb"));
app.run();
}
catch(Throwable err)
{
err.printStackTrace();
}
finally
{
app.close();
}
LOG.info("done.");
}
}

That's it.
Pierre

28 January 2007

Social Genealogy

From the FAQ: Geni is a unique approach to solving the problem of genealogy, which is the question of how everyone is related. Geni lets you create a family tree through our fun simple interface. When you add a relative's email address, he or she will be invited to join your tree. That relative can then add other relatives, and so on. Your tree will continue to grow as relatives invite other relatives. Each family member has a profile which can be viewed by clicking their name in the tree. This helps family members learn more about each other and stay in touch. Family members can also share information and work together to build profiles for common ancestors. Geni is a private network. Only the people in your tree can see your tree and your profile. Geni will not share your personal information with third parties.



See also, my previous post about how to draw pedigrees using DOT.

Pierre