Via FriendFeed I've started a collaboration with Andrew Su on the state of the articles about the proteins in wikipedia.
I've used the media wiki (http://en.wikipedia.org/w/api.php) to extract the revisions, the sizes of all the articles containing the Template:PBB_Summary.
A result of this survey is available here.
We first tried to use IBM/Many eyes to display the results, but the applet ran out of memory: http://services.alphaworks.ibm.com/manyeyes/view/SkWN8RsOtha6qaEpsJGCR2~.
I then wrote a custom java interface, inspired from ManyEyes, to display the results. This interface is available as an applet at here. (or you can run in as a javaws application at here with the previous dataset where you can save the image as SVG).
This is an ongoing project but all any suggestion will be appreciated. :-)
Pierre
Hi Pierre, I'm having some difficulty loading the data in the applet. I clicked "Category:Cancer Treatment", and saw some of the data for that, but when I attempted to click at other categories, the window failed to update.
ReplyDeleteAlso, I don't understand what's being plotted on the Y axis.
@Gotgene: thank you for this feedback.
ReplyDeleteY axis is the size of the pages.
the 3 vertical list act as a logical 'AND' so if you selected a Protein in the first panel which has no category selected in the second panel, the page will remain blank.