19 February 2008

Freebase Wikipedia Extraction (WEX)

Via the Freebase blog.

The Freebase Wikipedia Extraction (WEX) http://download.freebase.com/wex/ is a processed dump of the English language Wikipedia. The wiki markup for each article is transformed into machine-readable XML, and common relational features such as templates, infoboxes, categories, article sections, and redirects are extracted in tabular form.

Freebase WEX is provided as a set of database tables in TSV format for PostgreSQL, along with tables providing mappings between Wikipedia articles and Freebase topics, and corresponding Freebase Types


See also:



Pierre

1 comment:

Anonymous said...

Pierre,

I can't see the body of your posts in my feed reader.

Loving your work with Freebase and feeling quite envious :)