29 October 2008

A survey of the Proteins in Wikipedia

Via FriendFeed I've started a collaboration with Andrew Su on the state of the articles about the proteins in wikipedia.
I've used the media wiki (http://en.wikipedia.org/w/api.php) to extract the revisions, the sizes of all the articles containing the Template:PBB_Summary.
A result of this survey is available here.
We first tried to use IBM/Many eyes to display the results, but the applet ran out of memory: http://services.alphaworks.ibm.com/manyeyes/view/SkWN8RsOtha6qaEpsJGCR2~.
I then wrote a custom java interface, inspired from ManyEyes, to display the results. This interface is available as an applet at here. (or you can run in as a javaws application at here with the previous dataset where you can save the image as SVG).

This is an ongoing project but all any suggestion will be appreciated. :-)



gotgenes said...

Hi Pierre, I'm having some difficulty loading the data in the applet. I clicked "Category:Cancer Treatment", and saw some of the data for that, but when I attempted to click at other categories, the window failed to update.

Also, I don't understand what's being plotted on the Y axis.

Pierre Lindenbaum said...

@Gotgene: thank you for this feedback.

Y axis is the size of the pages.

the 3 vertical list act as a logical 'AND' so if you selected a Protein in the first panel which has no category selected in the second panel, the page will remain blank.