Evolution
AI³
Adaptive Information
Adaptive Innovation
Adaptive Infrastructure
a·dap·tive adj. Showing or having a capacity to make fit for new or special situations; flexible; a successful adjustment.

Blogasbörd (cloud version):
Send Email   Get SIOC Profile   Get FOAF Profile   Syndicate full contents for this site using RSS 20
Main Links
Categories
Calendar
February 2013
S M T W T F S
« Jan    
 12
3456789
10111213141516
17181920212223
2425262728  
Archives
More . . .  
Credits
Blog software courtesy of WordPress Site Meter View Mike's profile on LinkedIn
6295
Search
Date:   February 18, 2008

W3C Semantic WebWikipedia

Most Comprehensive Reference List Available Shows Impressive Depth, Breadth

Since about 2005 — and at an accelerating pace — Wikipedia has emerged as the leading online knowledge base for conducting semantic Web and related research. The system is being tapped for both data and structure. Wikipedia has arguably replaced WordNet as the leading lexicon for concepts and relations. Because of its scope and popularity, many argue that Wikipedia is emerging as the de facto structure for classifying and organizing knowledge in the 21st century.

Our work on the UMBEL lightweight reference subject concept structure has stated since the project’s announcement in July 2007 that Wikipedia is a key intended resource for identifying subject concepts and entities. For the past few months I have been scouring the globe attempting to find every drop of research I could find on the use of Wikipedia for semantic Web, information extraction, categorization and related issues.

Thus, I’m pleased to offer up herein the most comprehensive such listing available anywhere: more than 99 resources and counting! (I say “more than” because some entries below have multiple resources; I just liked the sound of 99 as a round number!)

Wikipedia itself maintains a listing of academic studies using Wikipedia as a resource; fewer than one-third of the listings below are on that list (which itself may be an indication of the current state of completeness within Wikipedia). Some bloggers and other sources around the Web also maintain listings in lesser degrees of completeness.

It is well documented the tremendous growth of content and topics within Wikipedia (see, as examples, the W1, W2, W3, W4, W5, W6 and W7 internal Wikipedia sources for gory details), with as of early 2008 about 2.25 million articles in English and versions in 256 languages and variants.

Download access to the full knowledge base has enabled the development of notable core references to the Linked Data aspects of the semantic Web such as DBpedia [5,6] and YAGO [72,73]. Entire research teams, such as Ponzetto and Strube [61-65] (and others as well; see below) are moving toward creating a full-blown ontologies or structured knowledge bases useful for semantic Web purposes based on Wikipedia. So, one of the first and principle uses of Wikipedia to date has been as a data source of concepts, entities and relations.

But much broader data mining and text mining and analysis is being conducted against Wikipedia, that is currently defining the state-of-the-art in these areas, too:

  • Ontology development and categorization
  • Word sense disambiguation
  • Named entity recognition
  • Named entity disambiguation
  • Semantic relatedness and relations.

These objectives, in turn, are mining and extracting these various kinds of structure for these purposes in Wikipedia:

  • Articles
    • First paragraph — Definitions
    • Full text — Description of meaning; related terms; translations
    • Redirects — Synonymy; spelling variations, misspellings; abbreviations
    • Title — Named entities; domain specific terms or senses
    • Subject — Category suggestion (phrase marked in bold or in first paragraph)
    • Section heading — Category suggestions
  • Article links
    • Context — Related terms; co-occurrences
    • Label — Synonyms; spelling variations; related terms
    • Target — Link graph; related terms
    • LinksTo — Category suggestion
    • LinkedBy — Category suggestion
  • Categories
    • Category — Category suggestion
    • Contained articles — Semantically related terms (siblings)
    • Hierarchy — Hyponymic and meronymic relations between terms
  • Disambiguation pages
    • Article links — Sense inventory
  • Infobox Templates
    • Name –
    • Item — Category suggestion; entity suggestion
  • Lists
    • Hyponyms

These are some of the specific uses that are included in the 99 resources listed below.

This is an exciting (and, for most all of us just a few years back, unanticipated) use of the Web in socially relevant and contextual knowledge and research. I’m sure such a listing one year by now will be double in size or larger!

BTW, suggestions for new or overlooked entries are very much welcomed! :)

A – B

  1. Sisay Fissaha Adafre and Maarten de Rijke, 2006. Finding Similar Sentences across Multiple Languages in Wikipedia, in EACL 2006 Workshop on New Text–Wikis and Blogs and Other Dynamic Text Sources, April 2006.See http://www.science.uva.nl/~mdr/Publications/Files/eacl2006-similarsentences.pdf.
  2. Sisay Fissaha Adafre, V. Jijkoun and M. de Rijke. Fact Discovery in Wikipedia, in 2007 IEEE/WIC/ACM International Conference on Web Intelligence. See http://staff.science.uva.nl/~mdr/Publications/Files/wi2007.pdf.
  3. Sisay Fissaha Adafre and Maarten de Rijke, 2005. Discovering Missing Links in Wikipedia, in LinkKDD 2005, August 21, 2005, Chicago, IL. See http://data.isi.edu/conferences/linkkdd-05/Download/Papers/linkkdd05-13.pdf.
  4. David Ahn, Valentin Jijkoun, Gilad Mishne, Karin Müller, Maarten de Rijke, and Stefan Schlobach. 2004. Using Wikipedia at the TREC QA Track, in Proceedings of TREC 2004.
  5. Sören Auer, Chris Bizer, Jens Lehmann, Georgi Kobilarov, Richard Cyganiak and Zachary Ives, 2007. DBpedia: A nucleus for a web of open data, in Proceedings of the 6th International Semantic Web Conference and 2nd Asian Semantic Web Conference (ISWC/ASWC2007), Busan, South Korea, volume 4825 of LNCS, pages 715–728, November 2007. See http://iswc2007.semanticweb.org/papers/ISWC2007_IU_Auer.pdf.
  6. Sören Auer and Jens Lehmann, 2007. What Have Innsbruck and Leipzig in Common? Extracting Semantics from Wiki Content, in The Semantic Web: Research and Applications, pages 503-517, 2007. See http://www.eswc2007.org/pdf/eswc07-auer.pdf.
  7. Somnath Banerjee, 2007. Boosting Inductive Transfer for Text Classification Using Wikipedia, at the Sixth International Conference on Machine Learning and Applications (ICMLA). See http://portal.acm.org/citation.cfm?id=1336953.1337115.
  8. Somnath Banerjee, Krishnan Ramanathan, Ajay Gupta, 2007. Clustering Short Texts using Wikipedia, poster presented at Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Amsterdam, The Netherlands, pp. 787-788.
  9. F. Bellomi and R. Bonato, 2005. Lexical Authorities in an Encyclopedic Corpus: A Case Study with Wikipedia, online reference not found.
  10. F. Bellomi and R. Bonato, 2005. Network Analysis for Wikipedia, presented at Wikimania 2005; see http://www.fran.it/articles/wikimania_bellomi_bonato.pdf
  11. Bibauw (2005) analyzed the lexicographical structure of Wiktionary (in French).
  12. Abhijit Bhole, Blaž Fortuna, Marko Grobelnik and Dunja Mladenić, 2007.Mining Wikipedia and Relating Named Entities over Time, see http://www.cse.iitb.ac.in/~abhijit.bhole/SiKDD2007ExtractingWikipedia.pdf.
  13. Razvan Bunescu and Marius Pasca. 2006. Using Encyclopedic Knowledge for Named Entity Disambiguation, in Proceedings of the 11th Conference of the EACL, pages 9-16, Trento, Italy.
  14. Razvan Bunescu, 2007. Learning for Information Extraction: From Named Entity Recognition and Disambiguation To Relation Extraction, Ph.D. thesis for the University of Texas, August 2007, 168 pp. See http://oucsace.cs.ohiou.edu/~razvan/papers/thesis-white.pdf.
  15. Luciana Buriol, Carlos Castillo, Debora Donato, Stefano Leonardi, and Stefano Millozzi. 2006. Temporal Analysis of the Wikigraph, in Proceedings of Web Intelligence, Hong Kong.
  16. Davide Buscaldi and Paolo Rosso, 2007. A Comparison of Methods for the Automatic Identification of Locations in Wikipedia, in Proceedings of GIR’07, November 9, 2007, Lisbon, Portugal, pp 89-91. See http://www.dsic.upv.es/~prosso/resources/BuscaldiRosso_GIR07.pdf.

C – F

  1. Ruiz-Casado, M., Alfonseca, E., and Castells, P. 2005. Automatic Assignment of Wikipedia Encyclopedic Entries to WordNet Synsets, in AWIC, pages 380-386. See http://nets.ii.uam.es/publications/nlp/awic05.pdf.
  2. Maria Ruiz-Casado, Enrique Alfonseca and Pablo Castells, 2006. From Wikipedia to Semantic Relationships: a Semi-automated Annotation Approach, in ESWC2006.
  3. Maria Ruiz-Casado, Enrique Alfonseca and Pablo Castells, 2007. Automatising the Learning of Lexical Patterns: an Application to the Enrichment of WordNet by Extracting Semantic Relationships from Wikipedia. See http://nets.ii.uam.es/publications/nlp/dke07.pdf.
  4. Sergey Chernov, Tereza Iofciu, Wolfgang Nejdl, and Xuan Zhou, 2006. Extracting Semantic Relationships between Wikipedia Categories, from SEMWIKI 2006.
  5. Aron Culotta, Andrew McCallum and Jonathan Betz, 2006. Integrating Probabilistic Extraction Models and Data Mining to Discover Relations and Patterns in Text, in Proceedings of HLT-NAACL-2006. See http://www.cs.umass.edu/~culotta/pubs/culotta06integrating.pdf
  6. Silviu Cucerzan, 2007. Large-Scale Named Entity Disambiguation Based on Wikipedia Data, in Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL). See http://www.aclweb.org/anthology-new/D/D07/D07-1074.pdf.
  7. Cyclopedia, from the Cyc Foundation, an online version of Wikipedia that enables browsing the encyclopedia by concepts. See http://www.cycfoundation.org/blog/?page_id=15.
  8. Wisam Dakka and Silviu Cucerzan, 2008. Augmenting Wikipedia with Named Entity Tags, to be published in IJCNLP. See http://research.microsoft.com/users/silviu/Papers/np-ijcnlp08.pdf.
  9. Wisam Dakka and Silviu Cucerzan, 2008. Also, see the online tool http://wikinet.stern.nyu.edu/ (not yet implemented).
  10. Turdakov Denis, 2007. Recommender System Based on User-generated Content. See http://syrcodis.citforum.ru/2007/9.pdf.
  11. EachWiki, online system from Fu et al.
  12. Linyun Fu, Haofen Wang, Haiping Zhu, Huajie Zhang, Yang Wang and Yong Yu, 2007. Making More Wikipedians: Facilitating Semantics Reuse for Wikipedia Authoring, at ISWC 2007. See http://iswc2007.semanticweb.org/papers/ISWC2007_RT_Fu.pdf.

G – H

  1. Evgeniy Gabrilovich and Shaul Markovitch. 2006. Overcoming the Brittleness Bottleneck using Wikipedia: Enhancing Text Categorization with Encyclopedic Knowledge, in AAAI, pages 1301-1306, Boston, MA.
  2. Evgeniy Gabrilovich and Shaul Markovitch. 2007. Computing Semantic Relatedness using Wikipedia-based Explicit Semantic Analysis, in Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI), Hyderabad, India, January 2007.
  3. Evgeniy Gabrilovich, 2006. Feature Generation for Textual Information Using World Knowledge, Ph.D. Thesis for The Technion – Israel Institute of Technology, Haifa, Israel, December 2006, 218 pp. See http://www.cs.technion.ac.il/~gabr/papers/phd-thesis.pdf. (There is also an informative video at http://www.researchchannel.org/prog/displayevent.aspx?rID=4915).)
  4. See also Gabrilovich’s Perl Wikipedia tool, WikiPrep.
  5. Rudiger Gleim, Alexander Mehler and Matthias Dehmer, 2007. Web Corpus Mining by Instance of Wikipedia, in Proc. 2nd Web as Corpus Workshop at EACL 2006. See http://acl.ldc.upenn.edu/W/W06/W06-1710.pdf.
  6. Andrew Gregorowicz and Mark A. Kramer, 2006. Mining a Large-Scale Term-Concept Network from Wikipedia, Mitre Technical Report, October 2006. See http://www.mitre.org/work/tech_papers/tech_papers_06/06_1028/06_1028.pdf.
  7. Andreas Harth, Hannes Gassert, Ina O’Murchu, John Breslin and Stefan Decker, 2005. WikiOnt: An Ontology for Describing and Exchanging Wiki Articles, presented at Wikimania, Frankfurt, 5th August 2005. See http://sw.deri.org/~jbreslin/presentations/20050805a.pdf.
  8. Martin Hepp, Daniel Bachlechner and Katharina Siorpaes, 2006. Harvesting Wiki Consensus – Using Wikipedia Entries as Ontology Elements, See http://www.heppnetz.de/files/SemWiki2006-Harvesting%20Wiki%20Consensus-LNCS-final.pdf.
  9. Martin Hepp, Katharina Siorpaes, Daniel Bachlechner, 2007. Harvesting Wiki Consensus: Using Wikipedia Entries as Vocabulary for Knowledge Management, in IEEE Internet Computing, Vol. 11, No. 5, pp. 54-65, Sept-Oct 2007. See http://www.heppnetz.de/files/hepp-siorpaes-bachlechner-harvesting%20wikipedia%20w5054.pdf. Also, sample data at http://www.heppnetz.de/harvesting-wikipedia/.
  10. A. Herbelot and Ann Copestake, 2006. Acquiring Ontological Relationships from Wikipedia Using RMRS, in Proc. International Semantic Web Conference 2006 Workshop, Web Content Mining with Human Language Technologies, Athens, GA, 2006. See http://orestes.ii.uam.es/workshop/12.pdf.
  11. Ryuichiro Higashinaka, Kohji Dohsaka and Hideki Isozaki, 2007. Learning to Rank Definitions to Generate Quizzes for Interactive Information Presentation, in Companion Volume to the Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics; see pages 117-120 within http://acl.ldc.upenn.edu/P/P07/P07-2.pdf
  12. Todd Holloway, Miran Bozicevic, and Katy Börner. 2005. Analyzing and Visualizing the Semantic Coverage of Wikipedia and Its Authors. ArXiv Computer Science e-prints, cs/0512085.
  13. Wei Che Huang, Andrew Trotman, and Shlomo Geva, 2007. Collaborative Knowledge Management: Evaluation of Automated Link Discovery in the Wikipedia, in SIGIR 2007 Workshop on Focused Retrieval, July 27, 2007, Amsterdam, The Netherlands. See http://www.cs.otago.ac.nz/sigirfocus/paper_15.pdf.

I – K

  1. Jonathan Isbell and Mark H. Butler, 2007. Extracting and Re-using Structured Data from Wikis, Hewlett-Packard Technical Report HPL-2007-182, 14th November, 2007, 22 pp. See http://www.hpl.hp.com/techreports/2007/HPL-2007-182.pdf.
  2. Maciej Janik and Krys Kochut, 2007. Wikipedia in Action: Ontological Knowledge in Text Categorization, University of Georgia, Computer Science Department Technical Report no. UGA-CS-TR-07-001. See http://lsdis.cs.uga.edu/~mjanik/UGA-CS-TR-07-001.pdf.
  3. Gjergji Kasneci, Fabian M. Suchanek, Georgiana Ifrim, Maya Ramanath and Gerhard Weikum, 2007. NAGA: Searching and Ranking Knowledge, Technical Report, Max-Planck-Institut f¨ur Informatik, MPI–I–2007–5–001, March 2007, 42 pp.See http://www.mpi-inf.mpg.de/~kasneci/naga/report.pdf
  4. Jun’ichi Kazama and Kentaro Torisawa, 2007. Exploiting Wikipedia as External Knowledge for Named Entity Recognition, in Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 698–707, Prague, June 2007.See http://acl.ldc.upenn.edu/D/D07/D07-1073.pdf.
  5. Daniel Kinzler, 2005. WikiSense Mining the Wiki, V 1.1, presented at Wikimania 2005, 10 pp. See http://brightbyte.de/repos/papers/2005/WikiSense-Presentation.pdf.
  6. A. Krizhanovsky, 2006. Synonym Search in Wikipedia: Synarcher, in 11th International Conference “Speech and Computer” SPECOM’2006. Russia, St. Petersburg, June 25-29, 2006, pp. 474-477. See http://arxiv.org/abs/cs/0606097 (also PDF).
  7. Natalia Kozlova, 2005. Automatic Ontology Extraction for Document Classification, a Master’s Thesis for Saarland University, February 2005, 90 pp. See http://domino.mpi-inf.mpg.de/imprs/imprspubl.nsf/80255f02006a559a80255ef20056fc02/3b864d86612739b0c1256fb70042547a/$FILE/Masterarbeit-Kozlova-Nat-2005.pdf. [READ]
  8. M. Krötzsch, D. Vrandečić and M. Völkel, 2005. Wikipedia and the Semantic Web: the Missing Links, at Wikimania 2005.Wikipedia and the Semantic Web: the Missing Links, at Wikimania 2005. See http://citeseer.ist.psu.edu/cache/papers/cs2/143/http:zSzzSzwww.aifb.uni-karlsruhe.dezSzWBSzSzmakzSzpubzSzwikimania.pdf/krotzsch05wikipedia.pdf.

L – N

  1. Rada Mihalcea and Andras Csomai, 2007. Wikify! Linking Documents to Encyclopedic Knowledge, in Proceedings of the Sixteenth ACM conference on Conference on information and knowledge management CIKM ’07 , November 6-8, 2007, pp. 233-241. See ACM portal retrieval (http://portal.acm.org/citation.cfm?id=1321475&coll=Portal&dl=ACM&CFID=8333672&CFTOKEN=48251251).
  2. Rada Mihalcea, 2007. Using Wikipedia for Automatic Word Sense Disambiguation, in Proceedings of NAACL HLT 2007, pages 196–203, April 2007.See http://www.cs.unt.edu/~rada/papers/mihalcea.naacl07.pdf.
  3. David Milne, 2007. Computing Semantic Relatedness using Wikipedia Link Structure; see also the Wikipedia Miner Toolkit (http://sourceforge.net/projects/wikipedia-miner/) provided by the author
  4. D. Milne, O. Medelyan and I. H. Witten, 2006. Mining Domain-Specific Thesauri from Wikipedia: A Case Study, in Proceedings of the International Conference on Web Intelligence (IEEE/WIC/ACM WI’2006), Hong Kong. Also Olena Medelyan’s home page, http://www.cs.waikato.ac.nz/~olena/.
  5. David Milne, Ian H. Witten and David M. Nichols, 2007. A Knowledge-Based Search Engine Powered by Wikipedia, at CIKM ’07.
  6. Lev Muchnik, Royi Itzhack, Sorin Solomon and Yoram Louzoun, 2007. Self-emergence of Knowledge Trees: Extraction of the Wikipedia Hierarchies, in Physical Review E (Statistical, Nonlinear, and Soft Matter Physics), Vol. 76, No. 1.
  7. Nadeau, D., Turney, P., Matwin, S., 2006. Unsupervised Named-Entity Recognition: Generating Gazetteers and Resolving Ambiguity, at 19th Canadian Conference on Artificial Intelligence. Québec City, Québec, Canada. June 7, 2006.See http://www.iit-iti.nrc-cnrc.gc.ca/iit-publications-iti/docs/NRC-48727.pdf. Doesn’t specifically use Wikipedia, but techniques are applicable.
  8. Kotaro Nakayama, Takahiro Hara and Shojiro Nishio, 2007. Wikipedia Mining for an Association Web Thesaurus Construction, in Web Information Systems Engineering – WISE 2007, Vol. 4831 (2007), pp. 322-334.
  9. Dat P.T. Nguyen, Yutaka Matsuo and Mitsuru Ishizuka, 2007. Relation Extraction from Wikipedia Using Subtree Mining, from AAAI ’07.

O – P

  1. Yann Ollivier and Pierre Senellar, 2007. Finding Related Pages Using Green Measures: An Illustration with Wikipedia, in Proceedings of the AAAI-07 Conference. See http://pierre.senellart.com/publications/ollivier2006finding.pdf. See also http://pierre.senellart.com/publications/ollivier2006finding/ for tools and data.
  2. Simon Overell and Stefan Ruger, 2006. Identifying and Grounding Descriptions of Places, in SIGIR Workshop on Geographic Information Retrieval, pages 14–16, 2006.See http://mmis.doc.ic.ac.uk/www-pub/sigir06-GIR.pdf.
  3. Simone Paolo Ponzetto and Michael Strube, 2006. Exploiting Semantic Role Labeling, WordNet and Wikipedia for Coreference Resolution, in NAACL 2006. See http://www.eml-research.de/english/homes/strube/papers/naacl06.pdf.
  4. Simone Paolo Ponzetto and Michael Strube, 2007a. Deriving a Large Scale Taxonomy from Wikipedia, in Association for the Advancement of Artificial Intelligence (AAAI2007).
  5. Simone Paolo Ponzetto and Michael Strube, 2007b. Knowledge Derived From Wikipedia For Computing Semantic Relatedness, in Journal of Artificial Intelligence Research 30 (2007) 181-212. See also these PPT slides (in PDF format): Part I, Part II, and References.
  6. Simone Paolo Ponzetto and Michael Strube, 2007c. An API for Measuring the Relatedness of Words in Wikipedia, in Companion Volume to the Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics, See pages 49-52 within http://acl.ldc.upenn.edu/P/P07/P07-2.pdf
  7. Simone Paolo Ponzetto, 2007. Creating a Knowledge Base from a Collaboratively Generated Encyclopedia, in Proceedings of the NAACL-HLT 2007 Doctoral Consortium, pp 9-12, Rochester, NY, April 2007. See http://www.aclweb.org/anthology-new/N/N07/N07-3003.pdf.

R – S

  1. Tyler Riddle. 2006. Parse::mediawikidump. URL http://search.cpan.org/~triddle/Parse-MediaWikiDump-0.40/.
  2. Ralf Schenkel, Fabian Suchanek and Gjergji Kasneci, 2007. YAWN: A Semantically Annotated Wikipedia XML Corpus, in BTW2007.
  3. Péter Schönhofen, 2006. Identifying Document Topics Using the Wikipedia Category Network, in Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence, pp. 456-462, 2006.See http://amber.exp.sis.pitt.edu/gale/paper/Identifying%20document%20topics%20using%20the%20Wikipedia%20category%20network.ppt.
  4. Börkur Sigurbjörnsson, Jaap Kamps, and Maarten de Rijke. 2006. Focused Access to Wikipedia. In Proceedings DIR-2006.
  5. Suzette Kruger Stoutenburg, 2007. Research Proposal: Acquiring the Semantic Relationships of Links between Wikipedia Articles, class proposal to http://www.coloradostoutenburg.com/cs589%20stoutenburg%20project%20proposal%20v1.doc
  6. Strube, M. and Ponzetto, S. P. 2006. WikiRelate! Computing Semantic Relatedness Using Wikipedia, in AAAI, pages 1419-1424, Boston, MA.
  7. Fabian M. Suchanek, Gjergji Kasneci and Gerhard Weikum, Yago – A Core of Semantic Knowledge, in 16th international World Wide Web Conference (WWW 2007), Banff, Alberta. See http://www.mpi-inf.mpg.de/~suchanek/downloads/yago/.
  8. Fabian M. Suchanek, Gjergji Kasneci and Gerhard Weikum, 2007. Yago: A Large Ontology from Wikipedia and WordNet, in Technical Report, submitted to the Elsevier Journal of Web Semantics 67 pp. See http://www.mpi-inf.mpg.de/~suchanek/publications/yagotr.pdf.
  9. Fabian M. Suchanek, Georgiana Ifrim and Gerhard Weikum, 2006. Combining Linguistic and Statistical Analysis to Extract Relations from Web Documents, in Knowledge Discovery and Data Mining (KDD 2006). See http://www.mpi-inf.mpg.de/~ifrim/publications/kdd2006.pdf.
  10. S. Suh, H. Halpin and E. Klein, 2006. Extracting Common Sense Knowledge from Wikipedia, in Proc. International Semantic Web Conference 2006 Workshop, Web Content Mining with Human Language Technologies, Athens, GA, 2006. See http://orestes.ii.uam.es/workshop/22.pdf.
  11. Zareen Syed, Tim Finin, and Anupam Joshi, 2008. Wikipedia as an Ontology for Describing Documents, from Proceedings of the Second International Conference on Weblogs and Social Media, AAAI, March 31, 2008. See http://ebiquity.umbc.edu/paper/html/id/383/Wikipedia-as-an-Ontology-for-Describing-Documents.

T – V

  1. J. A. Thom, J, Pehcevski and A. M. Vercoustre, 2007. Use of Wikipedia Categories in Entity Ranking. in Proceedings of the 12th Australasian Document Computing Symposium, Melbourne, Australia (2007). See http://arxiv.org/PS_cache/arxiv/pdf/0711/0711.2917v1.pdf.
  2. Antonio Toral and Rafael Muñoz, 2007. Towards a Named Entity Wordnet (NEWN), in Proceedings of the 6th International Conference on Recent Advances in Natural Language Processing (RANLP). Borovets (Bulgaria). pp. 604-608 . September 2007.See http://www.dlsi.ua.es/~atoral/publications/2007_ranlp_newn_poster.pdf.
  3. Antonio Toral and Rafael Muñoz, 2006. A Proposal to Automatically Build and Maintain Gazetteers for Named Entity Recognition by using Wikipedia, in Workshop on New Text, 11th Conference of the European Chapter of the Association for Computational Linguistics, Trento (Italy). April 2006.See http://www.dlsi.ua.es/~atoral/publications/2006_eacl-newtext_wiki-ner_paper.pdf.
  4. Anne-Marie Vercoustre, Jovan Pehcevski and James A. Thom, 2007. Using Wikipedia Categories and Links in Entity Ranking, in Pre-proceedings of the Sixth International Workshop of the Initiative for the Evaluation of XML Retrieval (INEX 2007), Dec 17, 2007. See http://hal.inria.fr/docs/00/19/24/89/PDF/inex07.pdf.
  5. Anne-Marie Vercoustre, James A. Thom and Jovan Pehcevski, 2008. Entity Ranking in Wikipedia, in SAC’08 March 16-20, 2008, Fortaleza, Ceara, Brazil. See http://arxiv.org/PS_cache/arxiv/pdf/0711/0711.3128v1.pdf.
  6. Max Völkel, Markus Krötzsch, Denny Vrandecic, Heiko Haller and Rudi Studer, 2006. Semantic Wikipedia, in Proceedings of WWW2006, pp 585-594.See http://www.aifb.uni-karlsruhe.de/WBS/hha/papers/SemanticWikipedia.pdf
  7. Jakob Voss. 2006. Collaborative Thesaurus Tagging the Wikipedia Way. ArXiv Computer Science e-prints, cs/0604036. See http://arxiv.org/abs/cs.IR/0604036
  8. Denny Vrandecic, Markus Krötzsch and Max Völkel, 2007. Wikipedia and the Semantic Web, Part II, in Phoebe Ayers and Nicholas Boalch, Proceedings of Wikimania 2006 – The Second International Wikimedia Conference, Wikimedia Foundation, Cambridge, MA, USA, August 2007.See http://wikimania2006.wikimedia.org/wiki/Proceedings:DV1.

W – Z

  1. Wang Y , Wang H , Zhu H , Yu Y, 2007. Exploit Semantic Information for Category Annotation Recommendation in Wikipedia, in Natural Language Processing and Information Systems (2007), pp. 48-60.
  2. Gang Wang, Yong Yu and Haiping Zhu, 2007. PORE: Positive-Only Relation Extraction from Wikipedia Text, at ISWC 2007, See http://iswc2007.semanticweb.org/papers/ISWC2007_RT_Wang(1).pdf.
  3. Yotaro Watanabe, Masayuki Asahara and Yuji Matsumoto, 2007. A Graph-based Approach to Named Entity Categorization in Wikipedia Using Conditional Random Fields, in EMNLP-CoNLL 2007, 29th June 2007, Prague, Czech. Has a useful CRF overview. See http://www.aclweb.org/anthology-new/D/D07/D07-1068.pdf and http://cl.naist.jp/~yotaro-w/papers/2007/emnlp2007.ppt.
  4. Timothy Weale, 2006. Utilizing Wikipedia Categories for Document Classification.
  5. Nicolas Weber and Paul Buitelaar, 2006. Web-based Ontology Learning with ISOLDE. See http://orestes.ii.uam.es/workshop/4.pdf.
  6. Wikify!, online service to automatically turn selected keywords into Wikipedia-style links. See http://www.wikifyer.com/.
  7. Wikimedia Foundation. 2006. Wikipedia. URL http://en.wikipedia.org/wiki/Wikipedia:Searching.
  8. Wikipedia, full database. See http://download.wikimedia.org.
  9. Wikipedia, general references. See especially (but only a small subset) for the Semantic Web, RDF, RDF Schema, OWL, SPARQL, GRDDL, W3C, Linked Data, many, many ontologies and controlled vocabularies such as FOAF, SKOS, SIOC, Dublin Core, etc; description logic areas (such as FOL), etc., etc. etc.
  10. Wikipedia, (as a) research source; see http://en.wikipedia.org/wiki/Wikipedia:Wikipedia_in_academic_studies.
  11. Fei Wu and Daniel S. Weld, 2007. Autonomously Semantifying Wikipedia.
  12. Hugo Zaragoza, Henning Rode, Peter Mika, Jordi Atserias, Massimiliano Ciaramita & Giuseppe Attardi, 2007. Ranking Very Many Typed Entities on Wikipedia, in CIKM ’07: Proceedings of the Sixteenth ACM International Conference on Information and Knowledge Management. See http://grupoweb.upf.es/hugoz/pdf/zaragoza_CIKM07.pdf.
  13. Torsten Zesch, Iryna Gurevych, Max Mühlhäuser, 2007. Analyzing and Accessing Wikipedia as a Lexical Semantic Resource, and the longer technical report. See http://www.ukp.tu-darmstadt.de/software/JWPL.
  14. Torsten Zesch and Iryna Gurevych, 2007. Analysis of the Wikipedia Category Graph for NLP Applications, in Proceedings of the TextGraphs-2 Workshop (NAACL-HLT).
  15. Vinko Zlatic, Miran Bozicevic, Hrvoje Stefancic, and Mladen Domazet. 2006. Wikipedias: Collaborative Web-based Encyclopedias as Complex Networks, in Physical Review E, 74:016115.

Posted by AI3's author, Mike Bergman

Posted on February 18, 2008 at 12:46 pm in Adaptive Information, Open Source, Semantic Web, Semantic Web Tools, Structured Web, UMBEL | Comments (18)
The URI link reference to this post is: http://www.mkbergman.com/417/99-wikipedia-sources-aiding-the-semantic-web/
The URI to trackback this post is: http://www.mkbergman.com/417/99-wikipedia-sources-aiding-the-semantic-web/trackback/
Date:   January 28, 2008

Big GraphCytoscape Thumbnail

Where Has the Biology Community Been Hiding this Gem?

Jewels & DoubloonsI still never cease to be amazed at how wonderful and powerful tools are so often and easily overlooked. The most recent example is Cytoscape, a winner in our recent review of more than 25 tools for large-scale RDF graph visualization.

We began this review because the UMBEL subject concept “backbone” ontology will involve literally thousands of concepts. Graph visualization software suitable to very large graphs would aid UMBEL’s construction and refinement.

Cytoscape describes itself as a bioinformatics software platform for visualizing molecular interaction networks and integrating these interactions with gene expression profiles and other state data. Cytoscape is partially based on GINY and Piccolo, among other open-source toolkits. What is more important to our immediate purposes, however, is that its design also lends itself well to general network and graph manipulation.

Cytoscape was first brought to our attention by François Belleau of Bio2RDF.org. Thanks François, and also for the strong recommendation and tips. Special thanks are also due to Frédérick Giasson of Zitgist for his early testing and case examples. Thanks, Fred!

Requirements

We had a number of requirements and items on our wish list prior to beginning our review. We certainly did not expect most or all of these items to be met:

  • Large scale – the UMBEL graph will likely have about 20,000 nodes or so; we would also like to be able to scale to instance graphs of hundreds of thousands or millions of nodes. For example, here is one representation of the full UMBEL graph (with nodes in pink-orange and different colored lines representing different relationships or predicates):
Full UMBEL Graph
  • Graph filtering – the ability to filter out the graph display by attribute, topology, selected nodes or other criteria. Again, here is an example using the ‘Organic’ layout produced by selecting on the Music node in UMBEL (click for full size):
Music Sub-graph, 'Organic' Layout
  • Graph analysis – the ability to analyze edge (or relation) lengths, cyclic aspects, missing nodes, imbalances across the full graph, etc.
  • Extensibility – the ability to add new modules or plugins to the system
  • Support for RDF – the ease for direct incorporation of RDF graphs
  • Graph editing – the interactive ability to add, edit or modify nodes and relations, to select colors and display options, to move nodes to different locations, cut-and-past operations and other standard edits, and
  • Graph visualization – the ease of creating sub-graphs and to plot the graphs with a variety of layout options.

Cytoscape met or exceeded our wish list in all areas save one: it does not support direct ingest of RDF (other than some pre-set BioPAX formats). However, that proved to be no obstacle because of the clean input format support of the tool. Simple parsing of triples into a CSV file is sufficient for input. Moreover, as described below, there are other cool attribute management functions that this clean file format supports as well.

Features and Attractions

The following screen shot shows the major Cytoscape screen. We will briefly walk through some of its key views (click for full size):

Cytoscape-UMBEL Main Screen

This Java tool has a fairly standard Eclipse-like interface and design. The main display window (A) shows the active portion of the current graph view. (Note that in this instance we are looking at a ‘Spring’ layout for the same Music sub-graph presented above.) Selections can easily be made in this main display (the red box) or by directly clicking on a node. The display itself represents a zoom (B) of the main UMBEL graph, which can also be easily panned (the blue box on B) or itself scaled (C). Those items that are selected in the main display window also appear as editable nodes or edges and attributes in the data editing view (D).

The appearance of the graph is fully editable via the VizMapper (E). An interesting aspect here is that every relation type in the graph (its RDF properties, or predicates) can be visually displayed in a different manner. The graphs or sub-graphs themselves can be selected, but also most importantly, the display can respond to a very robust and flexible filtering framework (F). Filters can be easily imported and can apply to nodes, edges (relations), the full graph or other aspects (depending on plugin). A really neat feature is the ability to search the graph in various flexible ways (G), which alters the display view. Any field or attribute can be indexed for faster performance.

In addition to these points, Cytoscape supports the following features:

  • Load and save previously-constructed interaction networks in GML format (Graph Markup Language)
  • Load and save networks and node/edge attributes in an XML document format called XGMML (eXtensible Graph Markup and Modeling Language)
  • Load and save arbitrary attributes on nodes and edges. For example, input a set of custom annotation terms or confidence values
  • Load and save state of the Cytoscape session in a Cytoscape Session (.cys) file. Cytoscape Session file includes networks, attributes (for node/edge/network), desktop states (selected/hidden nodes and edges, window sizes), properties, and visual styles (which are namable)
  • Customize network data display using powerful visual styles
  • Map node color, label, border thickness, or border color, etc. according to user-configurable colors and visualization schemes
  • Layout networks in two dimensions. A variety of layout algorithms are available, including cyclic and spring-embedded layouts
  • Zoom in/out and pan for browsing the network
  • Use the network manager to easily organize multiple networks, with this structure savable in a session file
  • Use the bird’s eye view to easily navigate large networks
  • Easily navigate large networks (100,000+ nodes and edges) by efficient rendering engine
  • Multiple plugins are available for areas such as subset selections, analysis, path analysis, etc. (see below).

Other Cytoscape Resources

The Cytoscape project also offers:

Unfortunately, other than these official resources, there appears to be a dearth of general community discussion and tips on the Web. Here’s hoping that situation soon changes!

Plugins

There is a broad suite of plugins available for Cytoscape, and directions to developers for developing new ones.

The master page also includes third-party plugins. The candidates useful to UMBEL and its graphing needs — also applicable to standard semantic Web applications — appear to be:

  • AgilentLiteratureSearch – creates a CyNetwork based on searching the scientific literature. Download from here
  • BubbleRouter – this plugin allows users to layout a network incrementally and in a semi-automated way. Bubble Router arranges specific nodes in user-drawn regions based on a selected attribute value. Bubble Router works with any node attribute file. Download from here
  • Cytoscape Plugin (Oracle) – enables a read/write interface between the Oracle database and the Cytoscape program. In addition, it also enables some network analysis functions from cytoscape. The README.txt file within the zipfile has instructions for installing and using this plugin. Download from here
  • DOT – interfaces with the GraphViz package for graph layout. The plugin now supports both simple and rank-cluster layouts. This software uses the dot layout routine from the graphviz opensource software developed at AT&T labs. Download from here
  • EnhancedSearch – performs search on multiple attribute fields. Download from here
  • HyperEdgeEditor – add, remove, and modify HyperEdges in a Cytoscape Network. Download from here
  • MCODE – MCODE finds clusters (highly interconnected regions) in a network. Clusters mean different things in different types of networks. For instance, clusters in a protein-protein interaction network are often protein complexes and parts of pathways, while clusters in a protein similarity network represent protein families. Download from here
  • MONET – is a genetic interaction network inference algorithm based on Bayesian networks, which enables reliable network inference with large-scale data(ex. microarray) and genome-scale network inference from expression data. Network inference can be finished in reasonable time with parallel processing technique with supercomputing center resources. This option may also be applicable to generic networks. Download from here
  • NamedSelection – this plugin provides the ability to “remember” a group of selected nodes. Download from here
  • NetworkAnalyzer – computes network topology parameters such as diameter, average number of neighbors, and number of connected pairs of nodes. It also displays diagrams for the distributions of node degrees, average clustering coefficients, topological coefficients, and shortest path lengths. Download: http://med.bioinf.mpi-inf.mpg.de/netanalyzer/index.html
  • SelConNet – is used to select the connected part of a network. Actually, running this plugin is like calling Select → Nodes → First neighbors of selected nodes many times until all the connected part of the network containing the selected nodes is selected. Download from here
  • ShortestPath – is a plugin for Cytoscape 2.1 and later to show the shortest path between 2 selected nodes in the current network. It supports both directed and undirected networks and it gives the user the possibility to choose which node (of the selected ones) should be used as source and target (useful for directed networks). The plugin API makes possible to use its functionality from another plugin. Download from here
  • sub-graph – is a flexible sub-graph creation, node identification, cycle finder, and path finder in directed and undirected graphs. It also has a function to select the p-neighborhood of a selected node or group of nodes which can be selected by pathway name, node type, or by a list in a file. This generates a set of plug-ins called: path and cycle finding, p-neighborhoods and sub-graph selection. Download from here.

Importantly, please note there is a wealth of biology- and molecular-specific plugins also available that are not included in the generic listing above.

Initial Use Tips

Our initial use of the tool suggests some use tips:

  • Try Cytoscape with the yFiles layouts; quicker to perform, and interesting results
  • Try the Organic yFile layout as one of the first
  • Try the search feature
  • Check the manual for examples of layouts
  • Holding the right mouse button down when in the main screen; moving the cursor from the center outward causes zoom in, from the exterior inward, to zoom out
  • Moving and panning nodes can be done in real time without issues
  • The “edge attribute browser” is really nice to find what node links to what other node by clicking on a link (so you don’t have to pan and check, etc)
  • Export to PDF often works best as an output display (though SVG is also supported)
  • If you select an edge and then Ctrl-left-click on the edge, an edge “handle” will appear. This handle can be used to change the shape of the line
  • Use the CSV file to make quick modifications, and then check it with the Organic layout
  • A convenient way to check the propagation of a network is to select a node, then click on Ctrl+6 again and again (Ctrl+6 selects neighborhood nodes of a selected node, so it “shows” you the network created by a node and its relationships)
  • If you want to analyze a sub-graph, search for a node, then press a couple of times on Ctrl+6, then create another graph from that selected node (File → New → Network → from selected node)
  • If you begin to see slow performance, then save and re-load your session; there appears to be some memory leaks in the program
  • Also, for very large graphs, avoid repeated use of certain layouts (Hierarchy, Orthogonal, etc.) that take very long times to re-draw.

Concluding Observations and Comments

Cytoscape was first released in 2002 and has undergone steady development since. Most recently, the 2.x and especially 2.3 versions forward have seen a flurry of general developments that have greatly broadened the tool’s appeal and capabilities. It was perhaps only these more recent developments that have positioned Cytoscape for broader use.

I suspect another reason that this tool has been overlooked by the general semWeb community is the fact that its sponsors have positioned it mostly in the biological space. Their short descriptor for the project, for example, is: Cytoscape is an open source bioinformatics software platform for visualizing molecular interaction networks and integrating these interactions with gene expression profiles and other state data. That statement hardly makes it sound like a general tool!

Another reason for the lack of attention, of course, is the common tendency for different disciplines not to share enough information. Indeed, one reason for my starting the Sweet Tools listing was hopefully as a means of overcoming artificial boundaries and assembling relevant semantic Web tools in one central place.

Yet despite the product’s name and its positioning by sponsors, Cytoscape is indeed a general graph visualization tool, and arguably the most powerful one reviewed from our earlier list. Cytoscape can easily accommodate any generalized graph structure, is scalable, provides all conceivable visualization and modeling options, and has a clean extension and plugin framework for adding specialized functionality.

With just minor tweaks or new plugins, Cytoscape could directly read RDF and its various serializations, could support processing any arbitrary OWL or RDF-S ontology, and could support other specific semWeb-related tasks. As well, a tool like CPath (http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=1660554), which enables querying of biological databases and then storing them in Cytoscape format, offers some tantalizing prospects for a general model for other Web query options.

For these reasons, I gladly announce Cytoscape as the next deserving winner of the (highly coveted, but cheesy! :) ) AI3 Jewels & Doubloons award.

Cytoscape’s sponsors — the U.S. National Institute of General Medical Sciences (NIGMS) of the National Institutes of Health (NIH), the U.S. National Science Foundation (NSF) and Unilever PLC — and its developers — the Institute for Systems Biology, the University of California – San Diego, the Memorial Sloan-Kettering Cancer Center, L’Institut Pasteur and Agilent Technologies – are to be heartily thanked for this excellent tool!

Jewels & Doubloons An AI3 Jewels & Doubloons Winner

Posted by AI3's author, Mike Bergman

Posted on January 28, 2008 at 6:45 pm in Adaptive Innovation, Bioinformatics, Jewels & Doubloons, Open Source, Semantic Web Tools, Structured Web, UMBEL | Comments (4)
The URI link reference to this post is: http://www.mkbergman.com/415/cytoscape-hands-down-winner-for-large-scale-graph-visualization/
The URI to trackback this post is: http://www.mkbergman.com/415/cytoscape-hands-down-winner-for-large-scale-graph-visualization/trackback/
Date:   September 24, 2007
Since the progression of WordPress beyond version 2.5x, the Advanced TinyMCE plug-in has reached its E.O.L. A better alternative that is being kept current is TinyMCE Advanced from Andrew Ozz. Advanced TinyMCE is still available for download for older WP versions.

New WordPress Release Imminent

WordPress LogoAccording to the WordPress codex, my Advanced TinyMCE Editor has been tested and found compatible with WordPress v. 2.3. (About 160 total plug-ins have been so verified.)

This new WP version adds enhanced support for tagging, among others (see Aaron Brazell’s 10 things you should know about this release).

I will be doing my own testing when a stable release is issued, but this is good news for this popular plug-in. You may get version 0.5 of Advanced TinyMCE from here.

Posted by AI3's author, Mike Bergman

Posted on September 24, 2007 at 9:00 am in Blogs and Blogging, Open Source | Comments (0)
The URI link reference to this post is: http://www.mkbergman.com/404/advanced-tinymce-editor-works-with-wordpress-v-23/
The URI to trackback this post is: http://www.mkbergman.com/404/advanced-tinymce-editor-works-with-wordpress-v-23/trackback/
Date:   September 11, 2007

rdf-zitgist-wordpress.png Zitgist’s Plug-in Exposes Linked Data for Hundreds of Thousands of WordPress Sites

Notice Anything New at the End of AI3‘s Links ??? (hint: )

The essence of the Web is the link. We use it to navigate, discover, form communities and get high rankings (or not!) for our Web pages on search engines. But, each link carries much more behind it than what has generally been exposed. That is, until now . . . .

Frédérick Giasson is a pragmatic innovator of the structured Web and semantic Web. Most recently, his efforts have included Ping the Semantic Web (that aggregates RDF published on the Web), the Zitgist semantic Web browser (that enables that RDF data to be viewed in useful ways), TalkDigger (for finding and sharing topical Web discussions), and efforts on a variety of ontologies, including jointly with me on UMBEL.

I have been an aggressive “linker” for some time and try to refer to Wikipedia often for definitions or background as well. Thus, Fred’s most recent efforts to continue to add value to the link as the basic coin of the Web realm really caught my eye.

What is zLinks?

In the early days of the Web, links were used solely to visit specific Web pages or locations within those documents. Somewhat later, actions such as searching or purchasing items could be associated with a link. Most recently, with the emergence of the semantic Web, the very nature of the link has become ambiguous, potentially representing any of the link’s former uses or either direct or indirect references to data and resources.

The Zitgist zLinks plug-in now makes these link uses explicit from within WordPress blogs.

Thus, we see that links can fulfill three different purposes, in rough order of their emergence:

  1. To visit Web pages and locations
  2. To potentially take actions (say, buy or search), and
  3. To retrieve data regarding resources.

The emergence of linked data and the semantic Web (or at least the provision of data via the structured Web) are making the use of the link more complicated and ambiguous. Moreover, sometimes a link is an indirect reference to where data exists, and not the actual resource itself.

What Zitgist’s zLinks does is to make these uses explicit and to remove ambiguities. Further, if a link is not to an actual resource but only a reference to it, zLinks resolves to the link’s correct destination. And, still further, a zLinks link is the gateway to still additional links from its reference destination, making the service a powerful jumping off point in the true spirit of the interlinked Web.

To my knowledge, zLinks is is the first and purest implementation of what Kingsley Idehen has termed the “enhanced anchor” or <a++>. RDFa and embedded RDF have similar objectives but are not premised on resolving the existing link.

Like the SIOC Import Plug-in, which imports SIOC metadata into a WordPress blog, the zLinks tool recognizes the importance of standard blogging software and automated background tools to expose data and capabilities. Since WordPress has many hundreds of thousands of site owners and bloggers — not to mention hundreds of millions of visitors — zLinks could be an important first exposure for many to the real power of linking and the semantic Web.

How Do You Use It?

As a site owner, zLinks works identically to other plug-ins: simply install it and then it works smoothly and easily.

As a site user who might encounter a zLinks icon in a WordPress blog, all you need to do is click on mouse over the zLinks launcher icon at the end of any visible link. You will first get an alert that the system is working, retrieving all of the necessary background link information. You will then get a popup showing the results, similar to this one for my own AI3 blog:

Sample Zitgist Browser Linker Popup

The zLinks popup offers direct and related links, with the icons and other associated information an indicator as to the nature of the link and its purpose. In our example case, I click on my name reference, which brings up my FOAF file in the Zitgist browser:

Example FOAF File from Zitgist Browser
[Click for full image]

Note how picture, mapping and other information is automatically “meshed” with my FOAF file. From this Zitgist browser location, I could obviously continue to explore still further links and relationships. In this manner, zLinks adds an entirely new dynamic dimension to the concept of ‘interlinking.’

If the initial zLinks link references data, that data is now resolved to its proper direct location, and is presented as RDF with further meshing and manipulation available. Other resources may take you directly to a Web page or perform other actions. Some of those actions, for example, may be to format data results in specific views (timelines, maps, charts, tables, graphs, structured reports, etc.). If the sources are data, the ability to make transformations or present the data in various views opens a rich horizon of options.

Tweaks and Caveats

I made some minor tweaks to the Zitgist distribution as provided. First, I replaced the initial link icon — – with this one –– that is smaller and more in keeping with my local WordPress theme. I did this simply by replacing the mini_rdf.gif image in the /public_html/wp-content/plugins/zitgist-browser-linker/imgs/ directory.

Then, also in keeping with my local theme, I made the text in the popup a bit smaller. I did this simply by adding a font-size: 80%; property to the style.css stylesheet in the /public_html/wp-content/plugins/zitgist-browser-linker/css/ directory.

And, that was it! Simple and sweet.

It is also important to realize that this is just a first-release prototype. Some initial bugs have been discovered and worked out, sometimes the server site is down, and longer-term potentialities are only now beginning to emerge. But, this is still professional software with much thought behind it and much potential in front of it. If it breaks, so what? It is free and it is fun.

Where Next?

To all of you out there new to RDF and structured, linked data, I say: Play and enjoy!

zLinks is only beginning to touch the most visible part of the iceberg. It is pretty clear that the use and usefulness of links are only now being understood. Harking back to the original listing of three possible uses for a link it is clear that “actions” and the use of the link itself as a referrer and “mini-banner” on the Web are still not appreciated, let alone exploited.

It is interesting that AdaptiveBlue has also come out with a SmartLinks approach that differs somewhat from the Zitgist approach (items and linkages are constructed and then referred to from a central location), but their screenshot does affirm the untapped potential of links.

The W3C semantic Web community continues to grapple with resource/link terminology and nuances, the implications of which will be deferred to another day and another blog entry. However, suffice it to say that with a growing ‘Web of data’ and linked data, not to mention the original document vision and then one of commerce and services, the once lowly link is growing mighty indeed!

Posted by AI3's author, Mike Bergman

Posted on September 11, 2007 at 1:57 pm in Adaptive Information, Blogs and Blogging, Information Automation, Open Source, Semantic Web Tools | Comments (5)
The URI link reference to this post is: http://www.mkbergman.com/400/sleek-zlinks-weaves-tightly-meshed-links-on-the-web/
The URI to trackback this post is: http://www.mkbergman.com/400/sleek-zlinks-weaves-tightly-meshed-links-on-the-web/trackback/
Date:   July 24, 2007

Huynh Adds to His Winning Series of Lightweight Structured Data Tools

David Huynh, a Ph.D. grad student developer par excellence from MIT’s Simile program, has just announced the beta availability of Potluck. Potluck allows casual users to mashup data on the Web using direct manipulation and simultaneous editing techniques, generally (but not exclusively!) based on Exhibit-powered pages.

Besides Potluck and Exhibit, David has also been the lead developer on such innovative Simile efforts as Piggy Bank, Timeline, Ajax, Babel, and Sifter, as well as a contributor to Longwell and Solvent. Each merits a look. Those familiar with these other projects will notice David’s distinct interface style in Potluck.

Taking Your First Bites

There is a helpful 6-min movie on Potluck that gives a basic overview of use and operation. I recommend you start here. Those who want more details can also read the Potluck paper in PDF, just accepted for presentation at ISWC 2007. And, after playing with the online demo, you can also download the beta source code directly from the Simile site.

Please note that Firefox is the browser of choice for this beta; Internet Explorer support is limited.

To invoke Potluck, you simply go to the demo page, enter two or more appropriate source URLs for mashup, and press Mix Data:

Potluck Entry Screen
[Click on image for full-size pop-up]

(You can also get to the movie from this page.)

Once the datasets are loaded, all fields from the respective sources are rendered as field tags. To combine different fields from different datasets, the respective field tags (color coded by dataset) to be matched are simply dragged to a new column. Differences in field value formats between datasets can be edited with an innovative approach to simultaneous group editing (see below). Once fields are aligned, they then may be assigned as browsing facets. The last step in working with the Potluck mashup is choosing either tabular or map views for the results display.

Potluck is designed to mashup existing Exhibit displays (JSON format), and is therefore lightweight in design. (Generally, Exhibit should be limited to about 500 data records or so per set.)

However, with the addition of the appropriate type name when specifying one of the sources to mash up, you can also use spreadsheet (xls), BibTeX, N3 or RDF/XML formats. The demo page contains a few sample data links. Additional sample data files for different mime types are (note entry using a space with type designator at end):

Besides the standard tabular display, you can also map results. For example, use the BibTeX example above and drop the “address” field into the first drop target area. Then, chose Map at the top of the display to get a mapping of conference locations.

In my own case, I mashed up this source and the xls sample on presidents, and then plotted out location in the US:

Example Potluck Map
[Click on image for full-size pop-up]

Given the capabilities in some of the other Simile tool sets, incorporating timelines or other views should be relatively straightforward.

Pragmatic Lessons and Cautions with Semantic Mashups

Different datasets name similar or identical things differently and characterize their data differently. You can’t combine data from different datasets without resolving these differences. These various heterogeneities — which by some counts can be 40 or so classes of possible differences — were tabulated in one of my recent structured Web posts.

There has been considerable discussion in recent days on various ontology and semantic Web mailing lists about how some practices may solve or not questions of semantic matching. Some express sentiments that proper use of URIs, use of similar namespaces and use of some predicates like owl:sameAs may largely resolve these matters.

However, discussion in David’s ISWC 2007 paper and use of the Potluck demo readily show the pragmatic issues in such matches. Section 2 in the paper presents a readable scenario for real-world challenges in how a historian without programming skills would go about matching and merging data. Despite best practices, and even if all are pursued, actually “meshing” data together from different sources requires judgment and reconciliation. One of the great values of Potluck is as a heuristic and learning tool for making prominent these real-world semantic heterogeneities.

The complementary value of Potluck is its innovative interface design for actually doing such meshing. Potluck is a case argument that pragmatic solutions and designs only come about by just “doing it.”

Easy, Simultaneous Editing

(Note: Though a diagram illustrates some points below, it is no substitute for using Potluck yourself.)

Potluck uses a simple drag-and-drop model for matching fields from different datasets. In the left-hand oval in the diagram below, the user clicks on a field name in a record, drags it to a column, and then repeats that process for matching fields in a records of a different dataset. In the instance below, we are matching the field names of “address” and “birth-place”, which then also get color coded by dataset:

Potluck's Simultaneous Group Editing
[Click on image for full-size pop-up]

This process can be repeated for multiple field matches. The merged fields themselves can be subsequently dragged-and-dropped to new columns for renaming or still further merging.

The core innovation at the heart of Potluck is what happens next. By clicking on Edit for any record in a merged field, the dialog shown above pops up. This dialog supports simultaneous group editing based on LAPIS, another MIT tool for editing text with lightweight structure developed by Ron Miller and team.

As implemented in JavaScript in Potluck, LAPIS first groups data items by similar patterned structure; this initial grouping is what determines the various columns in the above display. Then, when the user highlights any pattern in a column, these are repeated (see same cursors and shading in the right-hand oval) for all entries in the column. They can then be deleted (for pruning, in this case removing ‘USA’), or cut-and-pasted (such as for changing first- and last-name order) for all items in a column. (Single item editing is obviously also an option.)

The first grouping mostly ensures that data formatted differently in different datasets are displayed in their own column. One data form is used for the merged field, and all other columns are group edited to conform. The actual patterns are based on runs of digits, letters, white spaces, or individual punctuation marks and symbols, which are then “greedy” aligned for first the column grouping and then for cursor alignment within columns on highlighted patterns.

The net result is very fast and efficient bulk editing. This approach points the way to more complicated pattern matches and other substitution possibilities (such as unit changes or date and time formats).

Rough Spots and A Hope

I was tempted to award Potluck one of AI3‘s Jewels and Doubloons Awards, but the tool is still premature with rough spots and gaps. For examples, IE and browser support needs to be improved; it would be helpful to be able to delete a record from inclusion in the mashup. (Sometimes only after combining is it clear some records don’t belong together.)

One big issue is that the system does not yet work well with all external sites. For example, my own Sweet Tools Exhibit refused to load and the one from the European Space Agency’s Advanced Concept Team caused JavaScript errors.

Another big issue is that whole classes of functionality, such as writing out combined results or more data view options, are missing.

Of course, this code is not claimed to be commercial grade. What is most important is its pathbreaking approach to semantic mashups (actually, what some others such as Jonathan Lathem have called ‘smashups’) and interfaces and approaches to group editing techniques.

I hope that others pick up on this tool in earnest. David Huynh is himself getting close to completing his degree and may not have much time in the foreseeable future to continue Potluck development. Besides Potluck’s potential to evolve into a real production-grade utility, I think its potential to act as a learning test bed for new UI approaches and techniques for resolving semantic heterogeneities is even greater.

Posted by AI3's author, Mike Bergman

Posted on July 24, 2007 at 4:25 pm in Adaptive Information, Information Automation, Open Source, Semantic Web Tools, Structured Web | Comments (2)
The URI link reference to this post is: http://www.mkbergman.com/392/potluck-another-tasty-meal-from-mit-simile/
The URI to trackback this post is: http://www.mkbergman.com/392/potluck-another-tasty-meal-from-mit-simile/trackback/
Page 10 of 17« First...89101112...Last »
Copyright © 2004–2013 Michael K. Bergman.   This work is licensed under a Creative Commons License