WikipediaW3C Semantic Web

A Periodic Update of Semantic Web-related Research using Wikipedia

One of the more popular posts of this AI3 blog was a listing of 99 research articles that used Wikipedia in one way or another to do semantic-Web related research. It was first posted in February 2008. Many of you suggested additions to that listing.

Wikipedia continues to be an effective and unique source for many information extraction and semantic Web purposes. We thus decided to make this listing a permanent resource of this blog and to update it on a periodic basis. If you have additions to this listing, please suggest them here and they will be added to the next periodic update.

UPDATE: This alphabetical listing now contains 246 articles, and was posted on Jan. 25, 2010. This update adds 19 articles since the last listing of 227 entries posted on Sept. 10, 2009. All new entries since the last update are noted with [NEW].







  • EachWiki, online system from Fu et al.
  • Maike Erdmann, Kotaro Nakayama, Takahiro Hara, Sojiro Nishio, 2008. An Approach for Extracting Bilingual Terminology from Wikipedia, in Proceedings of the 13th International Conference on Database Systems for Advanced Applications (DASFAA). See http://www.cse.iitb.ac.in/~dasfaa08/research.htm.


  • A. Fader, S. Soderland and O. Etzioni. 2009. Scaling Wikipedia-based Named Entity Disambiguation to Arbitrary Web Text, in Proceedings of the Wiki-AI Workshop at IJCAI’09 Conf., Pasadena, CA, US.
  • Norberto Fernández, José M. Blázquez, Luis Sánchez and Vicente Luque, 2007. Semantic Annotation of Web Resources Using IdentityRank and Wikipedia, in Advances in Intelligent Web Mastering, Springer, 2007. See http://www.springerlink.com/content/w3944840l4061603/.
  • Sergio Ferrández, Antonio Toral, Óscar Ferrández, Antonio Ferrández and Rafael Muñoz, 2007. Applying Wikipedias Multilingual Knowledge to CrossLingual Question Answering, in Proceedings of the 12th International Conference on Applications of Natural Language to Information Systems, Paris, France, pp. 352363. June 2007.
  • [NEW] Angel Fogarolli and Marco Ronchetti, 2008. Intelligent Mining and Indexing of Multi-Language e-Learning Material, in Proceedings of the 1st International Symposium on Intelligent Interactive Multimedia Systems and Services, KES IIMS 2008, 9-11 July 2008 Piraeus, Greece Studies in Computational Intelligence, Springer-Verlag.
  • [NEW] Angela Fogarolli, 2009. Word Sense Disambiguation based on Wikipedia Link Structure, in Third IEEE International Conference on Semantic Computing, Berkeley, CA, USA – September 14-16. pp. 77-82. See http://www.angela-fogarolli.net/publications/ieee.pdf.
  • Angela Fogarolli and Marco Ronchetti, 2009. Extracting Semantics from Multimedia Content using Wikipedia, in the Special Issue of Scalable Computing: Practice and Experience, v. 1895-1767. See http://www.scpe.org/vols/vol09/no4/SCPE_9_4_03.pdf.
  • Linyun Fu, Haofen Wang, Haiping Zhu, Huajie Zhang, Yang Wang and Yong Yu, 2007. Making More Wikipedians: Facilitating Semantics Reuse for Wikipedia Authoring, at ISWC 2007. See http://iswc2007.semanticweb.org/papers/ISWC2007_RT_Fu.pdf.



  • Wei Che Huang, Andrew Trotman, and Shlomo Geva, 2007. Collaborative Knowledge Management: Evaluation of Automated Link Discovery in the Wikipedia, in SIGIR 2007 Workshop on Focused Retrieval, July 27, 2007, Amsterdam, The Netherlands. See http://www.cs.otago.ac.nz/sigirfocus/paper_15.pdf.





  • Miro Lehtonen and Antoine Doucet, 2007. EXTIRP: Baseline Retrieval from Wikipedia, in Comparative Evaluation of XML Information Retrieval Systems, pp. 115120.
  • Yinghao Li, Wing Pong Robert Luk, Kei Shiu Edward Ho and Fu Lai Korris Chung, 2007. Improving Weak Ad-Hoc Queries Using Wikipedia As External Corpus, in Kraaij et al. (editors) Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR07, Amsterdam, The Netherlands, July 2327, 2007, pp. 797798. ACM Press.
  • Qiaoling Liu, Kaifeng Xu, Lei Zhang, Haofen Wang, Yong Yu and Yue Pan, 2008. Catriple: Extracting Triples from Wikipedia Categories, in Proceedings of the 3rd Asian Semantic Web – 2008. See http://www.springerlink.com/content/w735w60105t3x7p2/.



  • David Nadeau, Peter D. Turney and Stan Matwin, 2006. Unsupervised Named-Entity Recognition: Generating Gazetteers and Resolving Ambiguity, at 19th Canadian Conference on Artificial Intelligence. Québec City, Québec, Canada. June 7, 2006. See http://www.iit-iti.nrc-cnrc.gc.ca/iit-publications-iti/docs/NRC-48727.pdf. Doesn’t specifically use Wikipedia, but techniques are applicable.
  • Kotaro Nakayama, Takahiro Hara and Shojiro Nishio, 2007a. Wikipedia Mining for an Association Web Thesaurus Construction, in Web Information Systems Engineering WISE 2007, Vol. 4831 (2007), pp. 322-334. See http://wikipedia-lab.org/en/images/9/90/Wise2007.pdf.
  • Kotaro Nakayama, Takahiro Hara and Sojiro Nishio, 2007a. Wikipedia: A New Frontier for AI Researches, in Journal of the Japanese Society for Artificial Intelligence 22(5), pp. 693701.
  • Kotaro Nakayama, Takahiro Hara and Sojiro Nishio, 2007b. A Thesaurus Construction Method from Large Scale Web Dictionaries, in Proceedings of the 21st IEEE International Conference on Advanced Information Networking and Applications, AINA07, May 2123, 2007, Niagara Falls, Canada, pp. 932939, iEEE Computer Society. See http://wikipedia-lab.org/en/images/a/a9/Aina2007.pdf.
  • Kotaro Nakayama, Takahiro Hara and Sojiro Nishio, 2008a. A Search Engine for Browsing the Wikipedia Thesaurus, in Proceedings of the 13th International Conference on Database Systems for Advanced Applications, Demo session (DASFAA08), pp. 690693. See http://wikipedia-lab.org/en/images/a/a6/Dasfaa2008.pdf.
  • Kotaro Nakayama, Takahiro Hara and Sojiro Nishio, 2008b. Wikipedia Mining – Wikipedia as a Corpus for Knowledge Extraction, in Proceedings of Annual Wikipedia Conference (Wikimania) (2008). See http://wikipedia-lab.org/en/images/0/06/Wikimania2008.pdf.
  • Kotaro Nakayama, Takahiro Hara and Shojiro Nishio, 2008c. Wikipedia Link Structure and Text Mining for Semantic Relation Extraction, in Stephan Bloehdorn, Marko Grobelnik, Peter Mika and Duc Thanh Tran, eds., Proceedings of the Workshop on Semantic Search (SemSearch 2008) at the 5th European Semantic Web Conference (ESWC 2008), June 2, 2008, Tenerife, Spain. See http://sunsite.informatik.rwth-aachen.de/Publications/CEUR-WS/Vol-334/paper-05.pdf.
  • Kotaro Nakayama, Masahiro Ito, Takahiro Hara and Sojiro Nishio, 2008. Wikipedia Mining for Huge Scale Japanese Association Thesaurus Construction, in Workshop Proceedings of the 22nd International Conference on Advanced Information Networking and Applications, AINA08, GinoWan, Okinawa, Japan, March 25 28, 2008, pp. 11501155, iEEE Computer Society.
  • Vivi Nastase, 2008. Topic-driven Multi-document Summarization with Encyclopedic Knowledge and Spreading Activation, in Conference on Empirical Methods in Natural Language Processing (EMNLP 2008), October 25-27, 2008, Waikiki, Hawaii. See http://www.eml-research.de/english/homes/nastase/Publications/nastase08c.pdf.
  • Vivi Nastase and Michael Strube, 2008. Decoding Wikipedia Categories for Knowledge Acquisition, in Proceedings of the AAAI08 Conference, Chicago, US, , pp.1219-1224. See http://www.eml-research.de/english/homes/nastase/Publications/nastase08b.pdf.
  • Vivi Nastase and Michael Strube, 2009. Combining Collocations, Lexical and Encyclopedic Knowledge for Metonymy Resolution, in Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, pages 910–918, Singapore, 6-7 August 2009. See http://www.aclweb.org/anthology/D/D09/D09-1095.pdf.
  • Rani Nelken and Elif Yamangil, 2008. Mining Wikipedias Article Revision History for Traning Computational Lingustic Algorithms, in Proceedings of the WIKI-AI: Wikipedia and AI Workshop at the AAAI08 Conference, Chicago, US. See http://eecs.harvard.edu/~elif/pubs/eggcorn.pdf.
  • Dat P.T. Nguyen, Yutaka Matsuo and Mitsuru Ishizuka, 2007a. Relation Extraction from Wikipedia Using Subtree Mining, from AAAI 07.
  • Dat P.T. Nguyen, Yutaka Matsuo and Mitsuru Ishizuka, 2007b. Subtree Mining for Relation Extraction from Wikipedia, in Proceedings of the HLT-NAACL 2007, pp, 125128. See http://www.aclweb.org/anthology-new/N/N07/N07-2032.pdf.
  • Dat P.T. Nguyen, Yutaka Matsuo and Mitsuru Ishizuka, 2007c. Exploiting Syntactic and Semantic Information for Relation Extraction from Wikipedia, in Proceedings of the IJCAI Workshop on Text-Mining and Link- Analysis, TextLink07. See http://www.miv.t.u-tokyo.ac.jp/papers/dat-IJCAI07-TextLinkWS.pdf.
  • Joel Nothman, James R. Curran and Tara Murphy, 2008. Transforming Wikipedia into Named Entity Training Data, Australian Language Technology Workshop. See http://www.alta.asn.au/events/alta2008/proceedings/pdf/ALTA2008_16.pdf.
  • Joel Nothman, 2008. Learning Named Entity Recognition from Wikipedia, Honours thesis, University of Sydney. See http://www.joelnothman.com/downloads/honsthesis.pdf.



  • Marius Pasca, 2009. Outclassing Wikipedia in Open-Domain Information Extraction: Weakly-Supervised Acquisition of Attributes over Conceptual Hierarchies, in Proceedings of the 12th Conference of the European Chapter of the ACL, pages 639–647, Athens, Greece, 30 March – 3 April 2009. See http://www.aclweb.org/anthology/E/E09/E09-1073.pdf.
  • V Pedro, RS Niculescu and L Lita, 2008. Okinet: Automatic Extraction of a Medical Ontology From Wikipedia, in WiKiAI08: a Workshop of AAAI2008. See http://www.aaai.org/Papers/Workshops/2008/WS-08-15/WS08-15-007.pdf.
  • Minghua Pei, Kotaro Nakayama, Takahiro Hara and Sojiro Nishio, 2008a. Constructing a Global Ontology by Concept Mapping using Wikipedia Thesaurus, in Proceedings of the 22nd International Conference on Advanced Information Networking and Applications, AINA08, GinoWan, Okinawa, Japan, March 2528, 2008, pp. 12051210, iEEE Computer Society.
  • Minghua Pei, Kotaro Nakayama, Takahiro Hara and Sojiro Nishio, 2008b. An Integrated Method for Web Resource Categorization, in Proceedings of IEEE International Symposium on Mining And Web (IEEE MAW).
  • Davide Picca and Adrian Popescu, 2007. Using Wikipedia and Supersense Tagging for Semi-automatic Complex Taxonomy Construction, in RANLP 2007, CALP workshop; see http://moromete.net/articles/picca_et_al_calp07_cr.pdf.
  • Simone Paolo Ponzetto and Michael Strube, 2006. Exploiting Semantic Role Labeling, WordNet and Wikipedia for Coreference Resolution, in NAACL 2006. See http://www.eml-research.de/english/homes/strube/papers/naacl06.pdf.
  • Simone Paolo Ponzetto and Michael Strube, 2007a. Deriving a Large Scale Taxonomy from Wikipedia, in Association for the Advancement of Artificial Intelligence (AAAI2007).
  • Simone Paolo Ponzetto and Michael Strube, 2007b. Knowledge Derived From Wikipedia For Computing Semantic Relatedness, in Journal of Artificial Intelligence Research 30 (2007) 181-212. See also these PPT slides (in PDF format): Part I, Part II, and References.
  • Simone Paolo Ponzetto and Michael Strube, 2007c. An API for Measuring the Relatedness of Words in Wikipedia, in Companion Volume to the Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics, See pages 49-52 within http://acl.ldc.upenn.edu/P/P07/P07-2.pdf
  • Simone Paolo Ponzetto, 2007. Creating a Knowledge Base from a Collaboratively Generated Encyclopedia, in Proceedings of the NAACL-HLT 2007 Doctoral Consortium, pp 9-12, Rochester, NY, April 2007. See http://www.aclweb.org/anthology-new/N/N07/N07-3003.pdf.
  • Simone Paolo Ponzetto and Roberto Navigli, 2009. Large-Scale Taxonomy Mapping for Restructuring and Integrating Wikipedia, in Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI-09), Pasadena, CA, July 2009. See http://ijcai.org/papers09/Papers/IJCAI09-343.pdf.
  • Martin Potthast, 2007. Wikipedia In The Pocket: Indexing Technology for Near-Duplicate Detection and High Similarity Search, in Proceedings of the 30th International ACM SIGIR Conference on Research and Development in Information Retrieval. See http://www.uni-weimar.de/medien/webis/publications/downloads/papers/potthast_2007.pdf.
  • [NEW] Andrea Prato and Marco Ronchetti, 2009. Using Wikipedia as a Reference for Extracting Semantic Information from a Text, in The Third International Conference on Advances in Semantic Processing SEMAPRO 2009, Malta.


  • None




  • Eugenio Tacchini, Andreas Schultz and Christian Bizer, 2009. Experiments with Wikipedia Cross-Language Data Fusion, in 5th Workshop on Scripting and Development for the Semantic Web, 31st May, 2009, Crete, Greece. See http://www.semanticscripting.org/SFSW2009/full_3.pdf.
  • [NEW] Sam Tardif and James R. Curran and Tara Murphy, 2009. Improved Text Categorisation for Wikipedia Named Entities, presented at the Australasian Language Technology Workshop 2009, December 3-4, 2009, Sydney, Australia. See http://www.alta.asn.au/events/alta2009/pdf/ALTA2009_14.pdf.
  • [NEW] Texterra is a toolkit for text mining. Texterra is based on novel text processing methods that exploit semantics extracted from Wikipedia. Texterra delivers a solution for organizing and monitoring collections of documents without the expensive customization that is present in contemporary systems. http://www.modis.ispras.ru/texterra/.


  • None


  • Marieke van Erp, Piroska Lendvai, and Antal van den Bosch, 2009. Comparing Alternative Data-Driven Ontological Vistas of Natural History, in Proceedings of the 8th International Conference on Computational Semantics, pages 282–285, Tilburg, January 2009. See http://aclweb.org/anthology-new/W/W09/W09-3728.pdf.
  • Benjamin Van Durme, Ting Qian and Lenhart Schubert, 2008. Class-Driven Attribute Extraction, in Proceedings of COLING. See http://aclweb.org/anthology/C/C08/C08-1116.pdf.
  • Anne-Marie Vercoustre, Jovan Pehcevski and James A. Thom, 2007. Using Wikipedia Categories and Links in Entity Ranking, in Pre-proceedings of the Sixth International Workshop of the Initiative for the Evaluation of XML Retrieval (INEX 2007), Dec 17, 2007. See http://hal.inria.fr/docs/00/19/24/89/PDF/inex07.pdf.
  • Anne-Marie Vercoustre, James A. Thom and Jovan Pehcevski, 2008. Entity Ranking in Wikipedia, inSAC08 March 16-20, 2008, Fortaleza, Ceara, Brazil. See http://arxiv.org/PS_cache/arxiv/pdf/0711/0711.3128v1.pdf.
  • Max Völkel, Markus Krötzsch, Denny Vrandecic, Heiko Haller and Rudi Studer, 2006. Semantic Wikipedia, in Proceedings of the 15th International Conference on World Wide Web, WWW06, Edinburgh, Scotland, May 2326, 2006.
  • Jakob Voss. 2006. Collaborative Thesaurus Tagging the Wikipedia Way. ArXiv Computer Science e-prints, cs/0604036. See http://arxiv.org/abs/cs.IR/0604036
  • Denny Vrandecic, Markus Krötzsch and Max Völkel, 2007. Wikipedia and the Semantic Web, Part II, in Phoebe Ayers and Nicholas Boalch, Proceedings of Wikimania 2006 – The Second International Wikimedia Conference, Wikimedia Foundation, Cambridge, MA, USA, August 2007. See http://wikimania2006.wikimedia.org/wiki/Proceedings:DV1.



  • None


  • Elif Yamangil and Rani Nelken, 2008. Mining Wikipedia Revision Histories for Improving Sentence Compression, in Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Columbus, Ohio, June 15-20, 2008. See http://eecs.harvard.edu/~elif/pubs/wikicompress.pdf.
  • Yulan Yan, Naoaki Okazaki, Yutaka Matsuo, Zhenglu Yang and Mitsuru Ishizuka, 2009. Unsupervised Relation Extraction by Mining Wikipedia Texts Using Information from the Web, in Proceedings of the 47th Annual Meeting of the ACL and the 4th IJCNLP of the AFNLP, pages 1021–1029, Suntec, Singapore, 2-7 August 2009, See http://www.aclweb.org/anthology/P/P09/P09-1115.pdf.
  • Junghoon Yang, Jangwhan Han, Inseok Oh and Mingyung Kwak, 2007. Using Wikipedia Technology for Topic Maps Design, in Proceedings of the ACM Southeast Regional Conference, pp. 106110. See http://doi.acm.org/10.1145/1233341.1233361.
  • Xiaofeng Yang and Jian Su, 2007. Coreference Resolution Using Semantic Relatedness Information from Automatically Discovered Patterns, in Proceedings of the 45th Annual meeting of the Association for Computational Linguistics, ACL07, Prague, Czech Republic, pp. 528535. See http://www.bootstrep.eu/pub/Extern/PublicationPage/acl0701.pdf.
  • Eric Yeh, Daniel Ramage, Christopher D. Manning, Eneko Agirre, Aitor Soroa and Ixa Taldea, 2009. WikiWalk: Random Walks on Wikipedia for Semantic Relatedness, in Proceedings of the 2009 Workshop on Graph-based Methods for Natural Language Processing, ACL-IJCNLP 2009, pages 41–49,
    Suntec, Singapore, 7 August 2009. See http://www.aclweb.org/anthology/W/W09/W09-3206.pdf.
  • [NEW] Gisle Ytrestøl, Dan Flickinger and Stephan Oepen, 2009. Extracting and Annotating Wikipedia Sub-Domains. See http://www.delph-in.net/wescience/tlt09.pdf.
  • Jonathan Yu, James A. Thom and Audrey Tam, 2007. Ontology Evaluation Using Wikipedia Categories for Browsing, in Proceedings of the 16th ACM Conference on Information and Knowledge Management, CIKM07, Lisbon, Portugal, November 68, 2007, pp. 223232. See http://goanna.cs.rmit.edu.au/~jyu/publications/YuEtal07.pdf.


Last Updated 1/25/10

Comments are closed.