(Holy Leap Year, Batman!)
I’ve stated many times I hate WordPress upgrades. I know the sponsors have tried to make it easier over time, but upgrades are still painful, wrought with risk and error, and always force me to research and figure out what went wrong.
I last upgraded to WP v. 2.2.1, and with a real rant to accompany it.
Since then, some of us had been seeing some insidious stuff getting inserted into our RSS feeds, but had not been able to stem it. Then, I was doing my normal morning systems check and saw that my site was completely down, completely blank. Grrrr. Who knows what that specific problem was.
Version 2.3.3. had been announced with a fix for the RSS feed spam problem, so, rather than trying to diagnose and fix my current version, it was time to upgrade. (Grrrr.)
But, then I realized, possibly by doing so, I might also see a fix to a longstanding issue I had had with plug-ins somehow limiting my chronological listing of past posts. (Hooray!) That one had really been sticking in my craw, had caused me to de-activate some plug-ins I thought useful, and had led to only a handful of prior posts appearing.
So, the upgrade was made. Sadly, no problems (other than the XML-RPC implementation issue) were solved. And, unfortunately, my chronological listings still only displayed when throttled back to the past 30 or so. (Grrrr.)
Well, s**t. So after (for what was for me, with some of my more complicated site aspects) nearly a two hour minor upgrade, the only real benefit I or my readers would see is that the site was no longer blank! This hardly looked like a good deal.
So, assuming the chronology problem fix was not near at hand, I decided to manually add the past entries back to my chronology page. (Actually, this sounds worse than it really is since I have learned some quick tricks for gleaning listings from other sites; I just turned those techniques on my own blog!). While grinding teeth to nubs, I did what everyone who works intimately with software often does: I did the workaround.
So, now all full listings have been restored (though still with some recent postings overlap; Grrrr).
What brought a smile was seeing some posts from a year or two ago that I liked and had completely forgotten; some others brought a shudder. Here are some older personal favorites:
Nonetheless, now all 250 or so posts on my site from Day 1 in early 2005 can now be seen again; it has been awhile!
Naturally, that was not the end of the saga.
After making the upgrade, I noticed that all category listings and lookups had been wiped off my blog. I could see them in the MySQL and the editor still had the listing, but the site itself and the admin panel were blank.
Grrrr. (Try to stay calm and not panic.)
It’s another one of those deals where it is time to search like crazy and hope that someone more knowledgable than me has encountered the same problem and fixed it. Sure enough, in an obscure reference, I got the glimmer that maybe re-starting MySQL could fix the problem.
Well, it did. But go figure. . . .
Thankfully, my Advanced TinyMCE plug-in that gives more editing functions works great for me in WP v. 2.3.3. At least that is a relief!
And so, we end on an anti-Grrrr note. Sweet dreams.
We are proceeding apace with the first release of the UMBEL (Upper-level Mapping and Binding Exchange Layer) lightweight subject concept ontology. The internal working version presently has 21,580 subject nodes, though further review will certainly change that number before public release of the first draft.
UMBEL defines “subject concepts” as a distinct subset of the more broadly understood concept such as used in the SKOS RDFS controlled vocabulary or formal concept analysis or the very general concepts common to some upper ontologies. Subject concepts are a special kind of concept: ones that are concrete, subject-related and non-abstract. We further contrast these with named entities, which are the real things or instances in the world that are members of these subject concept classes.
Thus, in UMBEL parlance, there are abstract concepts, subject concepts and named entities.
The “backbone” to UMBEL is its set of these reference (“canonical” if you will) subject concepts. These subject concepts are being derived from the OpenCyc version of the Cyc knowledge base. The resulting 22 K nodes of this subject structure are related via the predicates of subclassof and type; these are the graph’s edges. The graph pictures herein are the first glimpse of this UMBEL backbone structure.
We can take the full network graph and do a bit of simulation of diving deep into its structure, as the following figures show.
So, here is the big graph, with all nodes and edges (blue) displayed. This is just about at the limit of our graphing program, Cytoscape, which we estimate is limited to about 30 K nodes:
Through the manipulation of the topological coefficient, which is a relative measure for the extent to which a node shares neighbors with other nodes, we can zoom in on the Top 750 (actually, 759!) node gateways or hubs. There are other ways to evaluate key nodes in a network, but this one fairly nicely approximates the upper structure or hierarchy within the graph:
By tightening the coefficient further, we can get a view of the Top 350 (actually, the top 336). Were the system live and not a captured jpeg, we could zoom in and read the actual node labels.
The real value from a graph structure, of course, is that now we can make selections based on relationships, neighbors and distances for various reasoning, inference or relatedness purposes. This diagram begins by inputting “saab” as my car concept, and then getting all nodes within two links:
Alternatively, for the same “saab” car concept, I asked for all directly related links (in yellow) and did some pruning of car types to make the subgraph more readable and interesting:
This ability to manipulate and navigate this large subject backbone at will should bring immense benefits. And, because of its common sense grounding, the early explorations of this first-glimpse UMBEL structure look very logical and clean.
Once we complete the next packaging and draft release steps, anyone will be able to play with and manipulate this UMBEL structure at will. The ontology and the tools we are using to manipulate it are all open source.
Our next steps on UMBEL will have us publishing the technical report (TR) of how we screened and vetted the subject concepts from the Cyc knowledge base, using an updated OpenCyc version. That document will hopefully gain some broader review and scrutiny for the canonical listing of subject concepts.
Of course, all of that is merely leading up to the Release 0 of the published ontology. We are working diligently to get that posted as well in the very near future.
These graphs were built using the super Cytoscape large-graph visualization framework, which I previously reviewed with glowing praise. The subgraph extractions were greatly aided by a fantastic add-in called NetworkAnalyzer from the Max-Planck-Institut fÃ¼r Informatik. I will be writing more about this add-in at a later time, including some guidance for how to use it for meaningful ontology analysis. But, in the meantime, do check this add-in tool out. Mucho cool, and another winner !
Please, all, I encourage you to read the bottom portion of this short posting carefully and to bookmark it for future reference. It doesn’t get much shorter or sweeter than this.
Nice job, Danny! Now, do you care to take on the update of the layer cake?
Since about 2005 — and at an accelerating pace — Wikipedia has emerged as the leading online knowledge base for conducting semantic Web and related research. The system is being tapped for both data and structure. Wikipedia has arguably replaced WordNet as the leading lexicon for concepts and relations. Because of its scope and popularity, many argue that Wikipedia is emerging as the de facto structure for classifying and organizing knowledge in the 21st century.
Our work on the UMBEL lightweight reference subject concept structure has stated since the project’s announcement in July 2007 that Wikipedia is a key intended resource for identifying subject concepts and entities. For the past few months I have been scouring the globe attempting to find every drop of research I could find on the use of Wikipedia for semantic Web, information extraction, categorization and related issues.
Thus, I’m pleased to offer up herein the most comprehensive such listing available anywhere: more than 99 resources and counting! (I say “more than” because some entries below have multiple resources; I just liked the sound of 99 as a round number!)
Wikipedia itself maintains a listing of academic studies using Wikipedia as a resource; fewer than one-third of the listings below are on that list (which itself may be an indication of the current state of completeness within Wikipedia). Some bloggers and other sources around the Web also maintain listings in lesser degrees of completeness.
It is well documented the tremendous growth of content and topics within Wikipedia (see, as examples, the W1, W2, W3, W4, W5, W6 and W7 internal Wikipedia sources for gory details), with as of early 2008 about 2.25 million articles in English and versions in 256 languages and variants.
Download access to the full knowledge base has enabled the development of notable core references to the Linked Data aspects of the semantic Web such as DBpedia [5,6] and YAGO [72,73]. Entire research teams, such as Ponzetto and Strube [61-65] (and others as well; see below) are moving toward creating a full-blown ontologies or structured knowledge bases useful for semantic Web purposes based on Wikipedia. So, one of the first and principle uses of Wikipedia to date has been as a data source of concepts, entities and relations.
But much broader data mining and text mining and analysis is being conducted against Wikipedia, that is currently defining the state-of-the-art in these areas, too:
These objectives, in turn, are mining and extracting these various kinds of structure for these purposes in Wikipedia:
These are some of the specific uses that are included in the 99 resources listed below.
This is an exciting (and, for most all of us just a few years back, unanticipated) use of the Web in socially relevant and contextual knowledge and research. I’m sure such a listing one year by now will be double in size or larger!
BTW, suggestions for new or overlooked entries are very much welcomed!
Linked Data follows recommended practices for identifying, exposing and connecting data on the semantic Web. A robust Linking Open Data (LOD) community has rapidly developed around the practice since its approval as a formal project of the W3C’s Semantic Web Education and Outreach (SWEO) Interest Group in March 2007. Though counts rapidly become dated, today, in less than a year, the size of the Linked Data on the Web exceeds several billion RDF triples.
This foundation of interlinkable data comes from the highest value reference sources available, and includes most notable place, people, event, book, music, cultural, language and government entities. The following official figure of the LOD community, maintained by one of its founders, Richard Cyganiak, is updated frequently (click on the figure below to get the most recent interactive version), and shows well the breadth of this data value:
It would be putting it mildly to say that the LOD project has been a roaring success. New initiatives like the Billion Triple Challenge will continue to rapidly push forward the size and frontiers of Linked Data. We also have two signal events coming up in 2008 that demonstrate just how much Linked Data is coming of age.
The newly announced LinkedData Planet Conference and Expo being held in New York City on June 17th and 18th is notable for a number of reasons (besides Tim Berners-Lee being the special keynote speaker).
First, the conference represents the first direct exposure of Linked Data to the business and enterprise community. For years, the semantic Web community has largely been an academic one with its own set of meetings and venues. That began to change with the recent series on the Semantic Technology Conferences, held as usual in San Jose in May (this year’s is May 18-22). That meeting in 2007 drew more than 800 attendees and marked the first time that significant presence from the academic and research communities occurred.
Though valuable and chock-a-block with enterprise case studies, the Semantic Technology conference also is challenged by the amorphous understanding of what is the “semantic Web”. Reaching common understandings and getting cross-fertilization between the business and research communities can be a challenge.
Second, the LinkedData Planet Conference is occurring in NYC with an anticipated strong participation from East Coast financial interests. Like the Semantic Technology Conferences, it is important that the nascent technologies and applications supporting Linked Data receive the venture and funding attention they deserve.
Third, Jupitermedia is the event manager for the conference. Jupitermedia has a long history in producing quality industry events such as Internet World, Search Engine Strategies and ISPcon. Its meetings typically blend excellent content with strong community outreach and participation.
But, last, and to my mind most important, the very topic of Linked Data is very focused and pragmatic. There are real methods, real techniques and real applications available now to take advantage of Linked Data now. The business community need not wait on full semantics and total data exposure and automation in order to receive real value today.
Last July I wrote a piece entitled, More Structure, More Terminology and (hopefully) More Clarity. It, and related posts on the structured Web, had as its thesis that the Web was naturally evolving from a document-centric basis to a “Web of Data”. We already have much structured data available and the means through RDFizers and other techniques to convert that structure to Linked Data. Linked Data thus represented a very doable and pragmatic way station on the road to the semantic Web. It is a journey we can take today; indeed, many already are as the growth figures noted attest.
Here is a repeat of the diagram I used last July to make that argument (now highlighting the Linked Data phase in red):
|Document Web||Structured Web||
Hopefully, the LinkedData Planet Conference will act as a similar catalyst within the business community as the LOD project has been in the research one. And, hopefully, academia, research, venture and business interests can all come together over these two days to exploit the Linked Data value so readily at hand.
Another signal event coming up is the Linked Data on the Web (LDOW) workshop at the 17th International World Wide Web Conference (WWW2008) in Bejing. LDOW is a full-day session involving a mix of papers and demos on April 22. A significant roster of very interesting submissions has been made (disclosure: I am both on the program committee and a submitting author).
Linked Data was arguably the highlight at WWW2007 in Banff, and LDOW will certainly show just how far this approach has come in one short year. LDOW will likely provide a preview of many of the applications and approaches that will receive fuller attention at the LinkedData Planet Conference two months later.
* * * *
It is exciting to see Linked Data emerging as today’s pragmatic focus for bringing further structure, connections and semantics to the Web. For those of you new to the concept, I encourage you to become active and involved in 2008. And, for those of you already active, I look forward personally to working with you further in the coming year.