This is the Place to Start with the Academic Literature
Dan Roth and his former post-doc, Yangqiu Song, yesterday released a major paper on machine learning with knowledge bases, “Machine Learning with World Knowledge: The Position and Survey” . This 20-page paper with 250 references is a goldmine of starting points and a useful organizational schema for how to look at various machine learning applications and methods based on large-scale knowledge bases. The paper covers the exact territory that we refer to as knowledge-based artificial intelligence, or KBAI.
I always have my eye out for papers by Roth. Both he and his colleague at the University of Illinois Urbana-Champaign, Jiawei Han, publish well-thought and practical papers in the areas of knowledge representation and data mining. Various groups at Illinois also offer open-source software resources useful to these tasks. These efforts, in my view, are some of the best available worldwide.
The authors are proponents for the use of knowledge bases in machine learning for the same reasons we are: “Two essential problems of machine learning are how to generate features and how to acquire labels for machines to learn. Particularly, labeling large amount of data for each domain-specific problem can be very time consuming and costly. It has become a key obstacle in making learning protocols realistic in applications.” The authors then go on to address in specifics how knowledge bases can overcome these problems.
Besides the valuable reference listing, the other real contribution of the paper is how it frames and organizes what roles knowledge bases can play in AI. Like the paper’s problem statement, the authors organize their presentation around features and labels. They discuss the use and role of various techniques in relation to machine learning applications. The way they structure their presentation should be a help to those new to KBAI and the variety of terminology inherent to the field. Based on my own experience, I find their characterizations and guidance to be spot on.
I really have only one bone to pick with this otherwise excellent paper. No where do the authors discuss the quality, coherence or accuracy of the underlying knowledge bases used to perform these tasks. Many of the cited KB sources have known quality problems, some of which we have discussed before, such as Wikipedia (coverage and category structure), YAGO (reliance on WordNet), Freebase (highly variable quality and no longer maintained), Cyc (questionable upper ontology and mis-assignments), DBpedia (simplistic schema), etc. One of the major reasons for our efforts with KBpedia is to continue to work to create cleaner training environments suitable to machine learning so as to reduce the GIGO problem.
Still, quibbles aside, this paper will prove highly useful to anyone interested in distant supervised machine learning and knowledge-based artificial intelligence. For the foreseeable future, this paper should be a standard reference in your KBAI library.