SWEETpedia Listing of 163 Research Articles; NZ Technical Report Affirm Trend
An earlier popular entry of this AI3 blog was “99 Wikipedia Sources Aiding the Semantic Web”. Each academic paper or research article in that compilation was based on Wikipedia for semantic Web-related research. Many of you suggested additions to that listing. Thanks!
Wikipedia continues to be an effective and unique source for many information extraction and semantic Web purposes. Recently, I needed to update my own research and found that many valuable new papers have been added to the literature.
I thus decided to make a compilation of such papers a permanent feature — which I’ve named SWEETpedia — and to update it on a periodic basis. You can now find the most recent version under the permanent SWEETpedia page link.
Hint, hint: Check out this link to see the 163 Wikipedia research sources!
NOTE: If you know of a paper that I’ve overlooked, please suggest it as a comment to this posting and I will add it to the next update.
Status of Wikipedia
For starters, it summarizes the size and status of the English-version Wikipedia with a more discerning eye than usual:
|Articles and related pages||5,460,000|
|lists and stubs||620,000|
|between category and subcategory||740,000|
|between category and article||7,270,000|
The size, scope and structure of Wikipedia make it an unprecedented resource for researchers engaged in natural language processing (NLP), information extraction (IE) and semantic Web-related tasks. Further, the more than 250 language versions of Wikipedia also make it a great resource for multi-lingual and translation studies.
Growth of SWEETpedia
In the eight months since posting the semantic Web-related research papers using Wikipedia, my new SWEETpedia listing has grown by about 65%. There are now 63 new papers, bringing the total to 163.
Of course, these are not the only academic papers published about or using Wikipedia. The SWEETpedia listing is specifically related to structure, term, or semantic extractions from Wikipedia. Other research about frequency of updates or collaboration or growth or comparisons with standard encyclopedias may also be found under Wikipedia’s own listing of academic studies.
This graph indicates the growth in use of Wikipedia as a source of semantic Web research. It is hard to tell if the effort is plateauing or not; the apparent slight dip in 2008 is too early to yet conclude that.
For example, the current SWEETpedia listing adds another 35% more listings for 2007 to the earlier records. It is likely many 2008 papers will also be discovered later in 2009. Many of the venues at which these papers get presented can be somewhat obscure, and new researchers keep entering the field.
However, we can conclude that Wikipedia is assuming a role in semantic Web and natural language research never before seen for other frameworks.
Kinds of Semantic Web-related Research
As noted, the new 82-page technical report by Olena Medelyan et al. from the University of Waikato in New Zealand, Mining Meaning from Wikipedia , is now the must-have reference for all things related to the use of Wikipedia for semantic Web and natural language research.
Olena and her co-authors, Catherine Legg, David Milne and Ian Witten, have each published much in this field and were some of the earliest researchers tapping into the wealth of Wikipedia.
They first note the many uses to which Wikipedia is now being put:
- Wikipedia as an encyclopedia — the standard use familiar to the general public
- Wikipedia as corpus — large text collections for testing and modeling NLP tasks
- Wikipedia as a thesaurus — equivalent and hierarchical relationships between terms and related or synoymous terms
- Wikipedia as a database — the extraction and codification of structure and structural relationships
- Wikipedia as an ontology — the formal expression of relationships in semantic Web and logical constructs, and
- Wikipedia as a network structure — relationship analysis and mining through Wikipedia’s representation as a network graph.
These type of uses then enable the authors to place various research efforts and papers into context. They do so via four major clusters of relevant tasks related to language processing and the semantic Web:
Word sense disambiguation
words and phrases
thesaurus and ontology terms
Information Extraction (IE) Tasks:
Semantic relations in raw (unstructured) text
Semantic relations in structure
Typing (classifying) named entities
Ontology Building Tasks:
Facts extraction and assertion
There are many interesting observations throughout this report. There are also useful links to related tools, supporting and annotated datasets, and key researchers in the field.
I highly recommend this report as the essential starting point for anyone first getting into these research topics. Many of the newly added references to the SWEETpedia listing arose from this report. Reading the report is useful grounding to know where to look for specific papers in a given task area.
Though clearly the authors have their own perspectives and research emphases, they do an admirable job of being complete and even-handed in their coverage. Basic review reports such as this play an important role in helping to focus new research and make it productive.
Excellent job, folks! And, thanks!