Posted:May 25, 2007

T-SIOC, Object-centered Sociality

AI3 Note: This is the first experiment of directly publishing a blog post from another blog based solely on its RDF. The output came from the new WP SIOC plugin from the inestimable CaptSolo. I also created a new title and then edited the original title slightly as the sub-title. The direct stuff comes next. Thx, Capt!

This post was created by the WordPress SIOC Import plugin based on this SIOC RDF data describing a post located at

I've been reading Jyri Zengestrom's post about object-centred sociality again and I think this illustrates one usage of our SIOC Types module (T-SIOC) very nicely. I've extended my previous picture showing a person being linked across communities to this idea of people (via their user profiles) being connected by the content they create together, co-annotate, or for which they use similar annotations. Bob and Carol are connected via bookmarked URLs that they both have annotated and also through events that they are both attending, and Alice and Bob are using similar tags and are subscribed to the same blogs.


(See also Jyri Zengestrom's presentation on object-centred sociality, his paper on collaborative intentionality and social knots, and this resource about organisations and objects.)

Posted:May 15, 2007

Eat Your Greens!

You Can Make a Contribution by Adopting These Standards

You may have observed some changes to my masthead thingeys. I have added a couple of new icons Get FOAF Profile Get SIOC Profile(stage right, upper quadrant) to (hopefully) more prominently display some important stuff. These icons are for some standard RDF ontologies that are becoming prevalent, FOAF (Friend of a Friend) and SIOC (Semantically Interlinked Online Communities, pronounced “shock”).

If you click on either icon, you will see the respective FOAF or SIOC profile for my site, courtesy of Uldis BojārsSIOC browser (see below).

The truth is, today, these things are largely for the “insiders” working on semantic Web stuff. But the truth is also, tomorrow (and I mean that literally), you should know about these things and possibly adopt them for your own site [1].

OK, Let’s Hear the Acronyms Again

You have likely noticed that I have repeatedly used the acronyms of FOAF, SIOC, DOAP and SKOS (among many others!) in my recent postings. What is interesting about this alphabet soup is that they represent sort of “standard” ways to discuss people, communities, projects and “ontologies” within the semantic Web community. So, the first observation is that it is useful to have “standard” ways to describe anything.

Each of these standards is an RDF (Resource Description Framework) ontology or, perhaps in a more understood way, a common vocabulary and world view about how to discuss something. (By the way, don’t get all confused in such discussions with XML; XML is but one way of how you might describe something — which may or may not apply to any given semantic Web concept — versus what you are actually describing.)

Just as Wikipedia or Google have emerged as the “standards” within their domains, these RDF ontologies may or may not survive the brutal survival of the fittest within their own domains. So, I’m not asserting whether any of these formats is going to make it. But I am asserting that such formats are part of the emerging structured Web.

In fact, in one back story to the recently completed WWW 2007 conference in Banff, Alberta (a rousing success by all accounts!), Tim Berners-Lee noted he only wanted to have his photo taken with those who already had a FOAF profile!

The Start of Good Nutrition

A typical concern about the semantic Web is, “Who pays the tax?” In other words, to get the advantage of the metadata and characterizations of content that allows retrieval and manipulation at the object level (as opposed to the document or page level), why do it? Who benefits? Why incur the effort and cost?

We all now understand the benefits of an interlinked document world. We see it many times daily; indeed, it is now such a part of our cultural and daily life as to go unquestioned. Wow! How could that have happened in a mere decade?

The same thing is true for these semantic Web and RDF ontology constructs. We need to keep twirling the stick until enough friction takes place to cause the fire’s spark.

So, anyone interested in this next phase of the Web — that is, the move to structure and linked data — needs to eat their greens. The advantages of large numbers of the small percentage and network effects will cause this fire to grow hot. The real point behind such efforts, of course, is that if no one listens it is unimportant; but if many listen, importance grows as the function of this network effect.

Of course, over time, the mechanisms to create the structure upon which the semantic Web works will occur automatically and in the background. (Just as today many no longer hand-code the HTML in their Web pages.) But learning about the general structure of RDF and spending some time hand-coding your own FOAF profile is a worthwhile start to good semantic nutrition.

Greeens are Yummy

Do you remember your first attempt to learn HTML? What was it like to learn how to effectively query a search engine? What are the other myriad ways we have learned and adopted to this (now) constant Web environment?

So, assuming you want to get a bit exposed and expose your own Web site to these standards, what do you do next? Not to be definitive, but here are some approaches solely from my own standpoint as a blogger who uses WordPress. Other leading blog and CMS software have slightly different requirements; just search on your brand in combination with one of the alphabet soup acronyms.


FOAF is kind of fun. FOAF is an RDF vocabulary for describing people, organizations, and relations among them (see here for specification). My own observation is that it is less used for the friend part and more as a self-description.

Conceptually, FOAF is simple: Who are you, and who are your friends? Yet it is broadly extensible to include any variety or details in terms of personal, family, work history, likes, wants and preferences (via the addition of more namespaces). And, unfortunately, it is also generally pretty crappy in how it is applied and installed. So, if you truly wanted to get your picture taken with TimBL, what the heck should you do?

First, learn about FOAF itself. Christopher Allen in his Life with Alacrity blog [2] wrote a good introduction to FOAF about three years ago. He explains online FOAF services to help you craft your first foaf.rdf file, validation services, and other nuances.

Next, you will need to publicize the availability of your FOAF profile. Perhaps the best known option for WordPress is the FOAF Output Plugin from Morten Frederiksen. This is the most full-featured option, but it is also abstract and difficult to follow from an installation and configuration standpoint [3]. (Pretty common for this stuff.) I personally chose the FOAF Header plug-in by Ben Hyde (with MIT’s Simile program), which was simple, direct and it works.

Once you’ve got a basic profile working, it is then time to really tweak your profile according to your needs and desires. To really configure your profile, consult the FOAF vocabulary. Note that not all FOAF statements are stable. (That is, third-party apps may not parse or support the statements.) Please also note that FOAF has no mechanism to restrict or limit information to different recipients. Only put stuff in your profile that you want to be public.

You may also want to check out these personal FOAF profiles or the FOAF bulletin board to get some great examples of different FOAF content and styles.


SIOC provides methods for interconnecting discussion methods such as blogs, forums and mailing lists to each other (see here for specification). There is a partial compilation of SIOC-enabled sites on the ESW wiki.

Uldis Bojārs (nick CaptSolo) is doing a remarkable job across the board on these RDF standard ontology issues. First, he has written the definitive browser for these protocols, the SIOC browser. Second, he has contributed with others in the SIOC community to create general SIOC exporters for aggregators, DotClear, Drupal, mailing lists, a PHP API, phpBB and WordPress [4]. It is the latter version that I use on my own blog (the installation of which I initially documented last August).

Third, Uldis has written the Firefox add-on Semantic Radar. Semantic Radar inspects Web pages and detects links to semantic Web metadata from SIOC, FOAF or DOAP. When any of these forms are detected, its accompanying icon displays in the Firefox status bar and, if clicked, then displays the profile record in the SIOC browser. Very cool. (And, oh, BTW, Uldis is also one of the most helpful people around! :) )

Another cool option of Semantic Radar is that it pings Ping the Semantic Web (PTSW) when it detects a compliant site. PTSW is itself a highly useful aggregator of the semantic Web community and is one of Frédérick Giasson’s innovations exploiting interlinked data. This is a good site to monitor RDF publishing on the Web and new instances of sites that comply with the alphabet standards.

Staying Healthy

So, with the addition of the icons above, I’m now eating my greens! And, you know what, they’re both fun and nutritious.

I will continue to add to and improve my various online profiles. In the immediate future, I also plan to add DOAP and SKOS characterizations of my site and work. Now, back to the hand coding . . . .

[1] If one looks to early bloggers or early podcasters or whatever, the truth is that first users tend to get more traffic and attention. If you are a new blogger today, for example, the sad truth is that your ability to find a large audience is much reduced. Of course, that statement does not mean that you can not find a large audience (if that is your objective, and I don’t mean to suggest the only meaningful one either!), just that that percentage likelihood is lower today than yesterday.

[2] Chris has really scaled back on his online writing, which is a shame. His blog and the quality of his material was one of the reasons I took up blogging myself two years ago.

[3] Too many semantic Web options are hard to understand, install, configure or use. Too bad; the FOAF Output plug-in has mostly really good stuff here that most casual users would never consider.

[4] Key tools have been written by John Breslin (phpBB, Drupal), Alexandre Passant (DotClear, PHP API, bit of Drupal) Sergio Fdez (mailing lists) and Uldis (WordPress and help on the PHP API).

Posted by AI3's author, Mike Bergman Posted on May 15, 2007 at 1:27 pm in Adaptive Information, Semantic Web, Semantic Web Tools | Comments (3)
The URI link reference to this post is:
The URI to trackback this post is:
Posted:May 8, 2007

Now, that’s way cool!

Jasper Potts and his team at Sun have worked up some very nice magic with photo display and editing. Iris is an online photo browsing, editing and slide show application.

It is a “smash-up” of Java applets and next generation web concepts. You can create galleries, edit photos, rotate and create that cool 3D rotating photo cube effect we’ve been seeing lately, you name it. Jasper’s short online demo of Iris is really cool, too! (I think the live demo is to be presented in Jasper’s talk at JavaOne tomorrow.)

IRIS Online Demo

Iris works with the Flickr online photo service. What's going on under the hood is the use of a Java applet that receives JavaScript events when clicking on the Iris images to contact the Flickr web service; it actually provides the on screen ability to do remote Flickr operations. (Probably I shouldn’t mention it, shhhh, but this is what semantic Web interfaces can do once they become cool with remote data.).

Hey! Shag me baby!

BTW, thanks to Henry Story for the link to this (he is always sniffing out the good Java stuff).

Posted by AI3's author, Mike Bergman Posted on May 8, 2007 at 8:06 pm in Adaptive Innovation, Semantic Web Tools | Comments (0)
The URI link reference to this post is:
The URI to trackback this post is:
Posted:April 25, 2007

CKC Challenge at WWW 2007 The CKC Challenge Highlights A New Generation of Semantic Web Tools

Many predict — and I concur — collaborative methods to add rigor and structure to tagging and other Web 2.0 techniques will be one of the next growth areas for the semantic Web. Under the leadership of the University of Southampton, Stanford University and the University of Karlsruhe, the Collaborative Knowledge Construction (CKC) Challenge has been designed to seek use and feedback on this new generation of semantic Web collaboration tools.

Anyone is welcomed to register and participate during the challenge test period of April 16 – 30, with recognition to the most active and most insightful testers. The candidate tools are:

  • BibSonomy — is a Web-based social resource sharing system that allows users to organize and share bookmarks and publications collaboratively
  • Collaborative Protégé — is an extension of the existing Protégé system that supports collaborative ontology editing of components and annotations
  • DBin — is a general purpose application that enables domain experts to create “discussion groups” in which communities can annotate any subject of interest via RDF
  • Hozo — is an ontology visualization and development tool that brings version control constructs to group ontology development
  • OntoWiki — is a semantic collaboration platform implementing Web 2.0 approaches for the collaborative development of knowledge bases
  • SOBOLEO — is a system for Web-based collaboration to create SKOS taxonomies and ontologies and to annotate various Web resources using them.

Some of these tools are quite new and some I need to add to my Sweet Tools listing. The CKC Challenge Web site has nice write-ups, screen shots, and further information on these tools.

Results from the challenge will be discussed at the broader Workshop on Social and Collaborative Construction of Structured Knowledge at the 16th International World Wide Web Conference (WWW2007) in Banff, Canada, on May 8, 2007. As part of the general program, Jamie Taylor of Metaweb will also give an invited talk.

CKC Challenge participants do not need to attend in Banff to be eligible for recognition; all results and feedback will be made public by the Challenge organizers.

Posted by AI3's author, Mike Bergman Posted on April 25, 2007 at 10:23 pm in Semantic Web, Semantic Web Tools | Comments (0)
The URI link reference to this post is:
The URI to trackback this post is:
Posted:April 23, 2007

Image from In response to my review last week of OpenLink‘s offerings, Kingsley Idehen posted some really cool examples of what extracted RDF looks like. Kingsley used the content of my review to generate this structure; he also provided some further examples using DBpedia (which I have also recently discussed).

What is really neat about these examples is how amazingly easy it is to create RDF, and the “value added” that results when you do so. Below, I again discuss structure and RDF (only this time more on why it is important), describe how to create it from standard Web content, and link to a couple of other recent efforts to structurize content.

Why is Structure Important?

The first generation of the Web, what some now refer to as Web 1.0, was document-centric. Resources or links most always referred to the single Web page (or document) that displayed in your browser. Or, stated another way, the basic, most atomic unit of organizing information was the document. Though today’s so-called Web 2.0 has added social collaboration and tagging, search and display still largely occurs by this document-centric mode.

Yet, of course, a document or Web page almost always refers to many entities or objects, often thousands, which are the real atomic units of information. If we could extract out these entities or objects — that is the structure within the document, or what one might envision as the individual Lego © bricks from which the document is constructed — then we could manipulate things at a more meaningful level, a more atomic and granular level. Entities now would become the basis of manipulation, not the more jumbled up hodge-podge of the broader documents. Some have called this more atomic level of object information the “Web of data,” others use “Web 3.0.” Either term is OK, I guess, but not sufficiently evocative in my opinion to explain why all of this stuff is important.

So, what does this entity structure give us, what is this thing I’ve been calling the structured Web?

First, let’s take simple search. One problem with conventional text indexing, the basis for all major search engines, is the ambiguity of words. For example, the term ‘driver‘ could refer to a printer driver, Big Bertha golf driver, NASCAR driver, a driver of screws, family car driver, a force for change, or other meanings. Entity extraction from a document can help disambiguate what “class” (often referred to as a “facet” when applied to search or browsing) is being discussed in the document (that is, its context) such that spurious search results can be removed. Extracted structure thus helps to filter unwanted search results.

Extracted entities may also enable narrowing search requests to say, documents only published in the last month or from certain locations or by certain authors or from certain blogs or publishers or journals. Such retrieval options are not features of most search engines. Extracted structure thus adds new dimensions to characterize candidate results.

Second, in the structured Web, the basis of information retrieval and manipulation becomes the entity. Assembling all of the relevant information — irrespective of the completeness or location of source content sites — becomes easy. In theory, with a single request, we could collate the entire corpus of information about an entity, say Albert Einstein, from life history to publications to Nobel prizes to who links to these or any other relationship. The six degrees of Kevin Bacon would become child’s play. But sadly today, knowledge workers spend too much of their time assembling such information from disparate sources, with notable incompleteness, imprecision and inefficiency.

Third, the availability of such structured information makes for meaningful information display and presentation. One of the emerging exemplars of structured presentation (among others) is ZoomInfo. This, for example, is what a search on my name produces from ZoomInfo’s person search:

Example ZoomInfo Structured Result
[Click on image for full-size pop-up]

Granted, the listing is a bit out of date. But it is a structured view of what can be found in bits and pieces elsewhere about me, being built from about 50 contributing Web sources. And, the structure it provides in terms of name, title, employers, education, etc., is also more useful than a long page of results links with (mostly) unusable summary abstracts.

Presentations such as ZoomInfo’s will become common as we move to structured entity information on the Web as opposed to documents. And, we will see it for many more classes of entities beyond the categories of people, companies or jobs used by ZoomInfo. We are, for example, seeing such liftoff occurring in other categories of structured data within sources like DBpedia.

Fourth, we can get new types of mash-ups and data displays when this structure is combined, from calendars to tabular reports to timelines, maps, graphs of relatedness and topic clustering. We can also follow links and “explore” or “skate” this network of inter-relatedness, discovering new meanings and relationships.

And, fifth, where all of this is leading to is, of course, the semantic Web. We will be able to apply descriptive logic and draw inferences based on these relationships, resulting in the derivation of new information and connections not directly found in any of the atomic parts. However, note that much value still comes from the first areas of the structured Web alone, achievable immediately, short of this full-blown semantic Web vision.

OK, So What is this RDF Stuff Again?

As my earlier DBpedia review described, RDF — Resource Description Framework — is the data representation model at the heart of these trends. It uses a “triple” of subject-predicate-object, as generally defined by the W3C’s standard RDF model, to represent these informational entities or objects. In such triples, subject denotes the resource, and the predicate denotes traits or aspects of the resource and expresses a relationship between the subject and the object. (You can think of subjects and objects as nouns, predicates as verbs, and even think of the triples themselves as simple Dick-and-Jane sentences from a beginning reader.)

Resources are given a URI (as may also be given to predicates or objects that are not specified with a literal) so that there is a single, unique reference for each item. (OK, so here’s a tip: the length and complexity of the URIs themselves make these simple triple structures appear more complicated then they truly are! ‘Dick‘ seems much more complicated when it is expressed as

These URI lookups can themselves be an individual assertion, an entire specification (as is the case, for example, when referencing the RDF or XML standards), or a complete or partial ontology for some domain or world-view. While the RDF data is often stored and displayed using XML syntax, that is not a requirement. Other RDF forms may include N3 or Turtle syntax, and variants or more schematic representations of RDF also exist.

Here are some sample statements (among a few hundred generated, see later) from my reference blog piece on OpenLink that illustrate RDF triples:

The first four items have the post itself as the subject. The last statement is an entity referenced within my subject blog post. In all cases, the specific subjects of the triple statements are resources.

In all statements, the predicates point to reference URIs that precisely define the schema or controlled vocabularies used in that triple statement. For readability, such links are sometimes aliased, such as created at (time), links to, has title, is within topic, and has label, respectively, for the five example instances. These predicates form the edges or connecting lines between nodes in the conceptual RDF graph.

Lastly, note that the object, the other node in the triple besides the subject, may be either a URI reference or a literal. Depending on the literal type, the material can be full-text indexed (one triple, for example, may point to the entire text of the blog posting, while others point to each post image) or can be used to mash-up or display information in different display formats (such as calendars or timelines for date/time data or maps where the data refer to geo-coordinates).

[Depending on provenance, source format, use of aliases, or other changes to make the display of triples more readable, it may at times be necessary to "dereference" what is displayed to obtain the URI values to trace or navigate the actual triple linkages. Deferencing in this case means translating the displayed portion (the "reference") of a triple to its actual value and storage location, which means providing its linkable URI value. Note that literals are already actual values and thus not "dereferenced".]

The absolutely great thing about RDF is how well it lends itself through subsequent logic (not further discussed here) to map and mediate concepts from different sources into an unambiguous semantic representation [my 'glad' == (is the same as) your 'happy' OR my 'glad' is your 'glad']. Further, with additional structure (such as through RDF-S or the various dialects of OWL), drawing inferences and machine reasoning based on the data through more formal ontologies and descriptive logics is also possible.

How is This Structure Extracted?

The structure extraction necessary to construct a RDF “triple” is thus pivotal, and may require multiple steps. Depending on the nature of the starting content and the participation or not of the site publisher, there is a range of approaches.

Generally, the highest quality and richest structure occurs when the site publisher provides it. This can be done either through various APIs with a variety of data export formats, in which case various converters or translators to canonical RDF may be required by the consumers of that data, or in the direct provision of RDF itself. That is why the conversion of Wikipedia to RDF (done by DBPedia or System One with Wikipedia3) is so helpful.

I anticipate beyond Freebase that other sources, many public, will also become available as RDF or convertible with straightforward translators. We are at the cusp of a veritable explosion of such large-scale, high-quality RDF sources.

The next level of structure extractors are “RDFizers.” These extractors take other internal formats or metadata and convert them to RDF. Depending on the source, more or less structure may be extractable. For example, publishing a Web site with Dublin Core metadata or providing SIOC characterization for a blog (both of which I do for this blog site with available plugins, especially SIOC Plugin by Uldis Bojars or the Zotero COinS Metadata Exposer by Sean Takats), adds considerable structure automatically. For general listings of RDFizers, see my recent OpenLink review or the MIT Simile RDFizer site.

We next come to the general area of direct structure extraction, the least developed — but very exciting — area of gleaning RDF structure. There is a spectrum of challenges here.

At one end of the specturm are documents or Web sources that are published much like regular data records. Like the ZoomInfo listing above or category-oriented sites such as Amazon, eBay or the Internet Movie Data Base (IMDb), information is already presented in a record format with labels and internal HTML structure useful for extraction purposes.

Most of the so-called Web “wrappers” or extractors (such as Zotero’s translators or the Solvent extractor associated with Simile’s Semantic Bank and Piggy Bank) evaluate a Web page’s internal DOM structure or use various regular expression filters and parsers to find and extract info from structure. It is in this manner, for example, that an ISBN number or price can be readily extracted from an Amazon book catalog listing. In general, such tools rely heavily on the partial structure within semi-structured documents for such extractions.

The most challenging type of direct structure extraction is from unstructured documents. Approaches here use a family of possible information extraction (IE) techniques including named entity extraction (proper names and places, for example), event detection, and other structural patterns such as zip codes, phone numbers or email addresses. These techniques are most often applied to standard text, but newer approaches have emerged for images, audio and video.

While IE is the least developed of all structural extraction approaches, recent work is showing it to be possible to do so at scale with acceptable precision, and via semi-automated means. This is a key area of development with tremendous potential for payoff, since 80% to 85% of all content falls into this category.

Structure Extraction with OpenLink is a Snap

In the case of my own blog, I have a relatively well-known framework in WordPress with two resident plugins noted above that do automatic metadata creation in the background. With this minimum of starting material, Kingsley was able to produce two RDF extractions from my blog post using OpenLink’s Virtuoso Sponger (see earlier post). Sponger added to the richness of the baseline RDF extraction by first mapping to the SIOC ontology, followed then by mapping to all tags via the SKOS (simple knowledge organization structure) ontology and to all Web documents via the FOAF (friend-of-a-friend) ontology.

In the case of Kingsley’s first demo using the OpenLink RDF Browser Session (which gives a choice of browser, raw triples, SVG graph, Yahoo map or timeline views), you can do the same yourself for any URL with these steps:

  1. Go to
  2. Enter the URL of your blog post or other page as the ‘Data Source URI’, then
  3. Go to Session | Save via the menu system or just click on permalink (which produces an URL that is bookmark friendly that can then be kept permanently or shared with others).

It is that simple. You really should use the demo directly yourself.

But here is the graph view for my blog post (note we are not really mashing up anything in this example, so the RDF graph structure has few external linkages, resulting in an expected star aspect with the subject resource of my blog post at the center):

Example OpenLink RDF Browser View
[Click on image for full-size pop-up]

If you then switch to the raw triples view, and are actually working with the live demo, you can click on any URI link within a triple and get a JavaScript popup that gives you these further options:

Example View of 'Explore' Popup
[Click on image for full-size pop-up]

The ‘Explore’ option on this popup enables you to navigate to that URI, which is often external, and then display its internal RDF triples rather than the normal page. In this manner, you can “skate” across the Web based on any of the linkages within the RDF graph model, navigating based on objects and their relationships and not documents.

The second example Kingsley provided for my write-up was the Dynamic Data Web Page. To create your own, follow these steps:

  1. Go to
  2. Enter your post URL as the Data Source URI
  3. Execute
  4. Click on the ‘Post URI’ in the results table, then
  5. Pick the ‘Dereference’ option (since the data is non-local; otherwise ‘Explore’ would suffice).

Again, it is that simple. Here is an example screenshot from this demo (a poor substitute for working with the live version):

Example OpenLink Dynamic Page View
[Click on image for full-size pop-up]

This option, too, also includes the ‘Explore’ popup.

These examples, plus other live demos frequently found on Kingsley’s blog (none of which requires more than your browser), show the power of RDF structuring and what can be done to view data and produce rich interrelationships “on the fly”.

Come, Join in the Fun

With the amount of RDF data now emerging, rapid events are occurring in viewing and posting such structure. Here are some options for you to join in the fun:

  • Go ahead and test and view your own URLs using the OpenLink demo sites noted above
  • Check out the new DBpedia faceted search browser created by Georgi Kobilarov, one of the founding members of the project. It will soon be announced; keep tabs on Georgi’s Web site
  • And, review this post by Patrick Gosetti-Murrayjohn and Jim Groom on ways to use the structure and searching power of RDF to utilize tags on blogs. They combined RSS feeds from about 10 people with some SIOC data using RAP and Dave Beckett’s Triplr. This example shows how RSS feeds are themselves a rich source of structure.