Posted:December 16, 2013

Schema.orgComplementary Efforts of the W3C Mirror the Trend

Two and one-half years ago the triumvirate of Google, Bing and Yahoo! — soon to be joined by Yandex, the major Russian search engine — released schema.org. The purpose of schema.org is to bring a simple means for Web site owners and authors to tag their sites with a vocabulary, designed to be understandable by search engines, to describe common things and activities on the Web. Though informed and led by innovators with impeccable backgrounds in the early semantic Web and knowledge representation [1], the founders of schema.org also understood that the Web is a messy place with often wrong syntax and usage. Their stated commitment to simplicity and practicality caused me to state the day of release that schema.org was “perhaps the most important event for the structured Web since RDF was released a dozen years ago.”

Just a week ago schema.org version 1.0e was released. That event, plus much else in recent months, is suggesting a real maturity and take up of schema.org. It looks like the promise of schema.org is being fulfilled.

Growth and Impact of the schema

When first released, schema.org provided nearly 300 structured record types that may be used to tag information in Web pages. Via various collaborative processes since, and with an active discussion group, the schema.org vocabulary has about doubled in size. Some key areas of expansion have been in describing various actions, adding basic medical terms, product and transaction expansion via linkages to GoodRelations, civic services, and most recently, accessibility. Many other additions are in progress.

In his keynote address at ISWC 2013 in Sydney on October 23, Ramathan Guha [1] reported that 15 percent of crawled pages and 5 million sites have some schema.org markup. We can also see that some of the most widely used content management systems on the Web, notably including WordPress, Joomla and Drupal, have or plan to have native schema.org support. These tooling trends are important because, though designed for simple manual markup, it does require a bit of attention and skill to get schema.org markup right. Having markup added to pages automatically in the background is the next threshold for even broader adoption.

The ability of the schema.org vocabulary to capture essential domain facts as structured data is reflected in the growing list of prominent sites tagging with schema.org. According to Guha, these are some of the prominent sites now using schema.org:

Category Prominent Sites
News Nytimes, guardian.com, bbc.co.uk
Movies imdb, rottentomatoes, movies.com
Jobs / careers careerjet.com, monster.com, indeed.com
People linkedin.com
Products ebay.com, alibaba.com, sears.com, cafepress.com, sulit.com, fotolia.com
Videos youtube, dailymotion, frequency.com, vinebox.com
Medical cvs.com, drugs.com
Local yelp.com, allmenus.com, urbanspoon.com
Events wherevent.com, meetup.com, zillow.com, eventful
Music last.fm, myspace.com, soundcloud.com
Key Applications pinterest.com, opentable.com

Examples like Pinterest show how schema.org can also provide a central organizing point for new ventures and applications. There are also key relationships between schema.org and new search initiatives such as Google’s Now or its knowledge graph.

From day one schema.org was released with a mechanism for other parties to extend its vocabulary. However, more recently, there has been a significant increase of attention on questions of interoperability and relation to other existing vocabularies. To wit:

  • Prominent knowledge representation experts, such as Peter Patel-Schneider, have become active to suggest better interoperability and design considerations
  • The root of schema.org is now recognized as owl:Thing
  • Much discussion has occurred on integration or interoperability or not with SKOS, the simple knowledge organizational vocabulary
  • Provisions have been added to capture concepts such as domain and range
  • Calls have been made to increase the number of examples and documentation, including enforcing consistency across the vocabulary.

To be clear, it was never the intent for schema.org to become a single, governing vocabulary for the Web. Nonetheless, these broader means to enable others to tie in effectively with it are an indicator that schema.org’s sponsors are serious about finding effective common grounds.

Aside from certain areas such as recipes or claiming site or blog ownership, it has been unclear how the search engines are actually using schema.org markup or not. The sponsors have oft stated a go-slow attitude to see if the marketplace indeed embraces the vocabulary or not. I’m also sure that the sponsors, as familiar as they are with spam and erroneous markup, have also wanted to put in place effective ingest procedures that do not reduce the quality of their search indexes.

Getting Dan Brickley, one of the better-known individuals in RDF and the semantic Web, to act as schema.org’s liaison to the broader community, and beginning to open up about actual usage and uptake of schema.org are great signs of the sponsors’ commitment to the vocabulary. We should expect to see a much quickened pace and more visibility for schema.org within the search services themselves within the coming months.

W3C’s Complementary Efforts

Meanwhile, back at the ranch, a number of other interesting efforts are occurring within the World Wide Web Consortium (W3C) that are complementary to these trends. As readers of this blog well know, I have argued for some time that RDF makes for a fantastic data model for interoperating disparate content, which our company Structured Dynamics centrally relies upon, but that RDF is not an essential for metadata specification or exchange. Understood serializations based on understood vocabularies — in other words, exactly the design of schema.org — should be sufficient to describe the various types of things and their attributes as may be found on the Web. This idea of structured data in a variety of forms puts control into the hands of content authors. Various markets will determine what makes best sense for them as to how they actually express that structured data.

Last week the W3C announced its retirement of the Semantic Web group, subsuming it instead into the activities of the new W3C Data Activity. The W3C also announced a new group in CSV (comma-separated values) data exchange to go along with recent efforts in JSON-LD (linked data).

These are great trends that reflect a prejudice to adoption. Along with the advances taking place with schema.org, the Web now appears to be entering into a golden age of structured data.


[1] For example, a Google Fellow instrumental in founding schema.org is Ramanathan V. Guha, with a background extending back to Cyc and through Apple and Netscape through what came to be RDF. Guha was also the lead executive behind Google’s Knowledge Graph, which has some key relations with schema.org.

Posted by AI3's author, Mike Bergman Posted on December 16, 2013 at 11:39 am in Linked Data, Structured Web | Comments (2)
The URI link reference to this post is: https://www.mkbergman.com/1696/the-maturation-of-schema-org/
The URI to trackback this post is: https://www.mkbergman.com/1696/the-maturation-of-schema-org/trackback/
Posted:September 2, 2013

Wordpress Logo My WordPress Blog Gets Long in the Tooth; the Upgrade Story

One of the first things I did when I began my blog back in 2005 was to document all of the steps taken to set up and run WordPress on an own server. That how-to document, now retired, was a popular manual for quite a few years. At inception I began with WordPress version 1.5 (it is now at version 3.6) and I modified key files directly to achieve the look-and-feel of the site, rather than using a pre-packaged theme as is the general practice today.

Over the years I have added and replaced plugins, kept WordPress versions current, and made many other changes to the site. Site search, as one case in point, has itself gone through five different iterations. In the early years I also had a small virtual server from a hosting service, which came to perform poorly. It was at that point that I began to learn about server administration and tuning; WordPress is a notable resource hog that can perform poorly on small servers or when not properly tuned.

When Structured Dynamics shifted to cloud computing and software as a service (SaaS) for its own semantic technologies, I made the transition as well with this blog. That transition to AWS worked well, until we started to get Apache server freezes a couple of years back. We did various things to inspect plugins and apache2.conf configuration settings [1] to stablize operation. Then, a couple of months back, site lockouts became more frequent. Since all obvious and standard candidates had been given attention, our working hypothesis was that my homegrown theme of eight years ago was the performance culprit, possibly using deprecated WordPress PHP functions or poor decade-old design.

Software problems like this are an abscess. You can ignore them for a while, but eventually you have to lance the boil and fix the underlying problem.

Updating the Theme

A decade of use and refinement leads to a site theme structure and set of style sheets that is nearly indecipherable. Each new feature, each new change, results in modifications to these things, which are also then sometimes abandoned, but their instructions and data often remain. How this stuff accumulates is the classic definition of cruft.

If one accepts that a theme re-design is at some point inevitable, but it is preferable to make such changes as infrequently as possible, then it is imperative that a forward-looking theme be chosen for the next-generation re-design. HTML5 and mobile first are certainly two of the discernable directions in Web design.

With these drivers in mind, I set out to find a new baseline theme. There are many surveys of adaptive WordPress themes; I studied and installed many of the leading candidates, ultimately settling upon the underscores theme as my new design basis. I chose underscores (also known as “_s”) because its code is clean and modular, it is designed to be modified and tailored (and not simply “themed” via the interaction of colors and choices), it is open source, and there is a nice starting utility that prefixes important function calls within the theme.

Though there is a somewhat useful starting guide for underscores, this theme (and many other starting bases for responsive design) require quite a bit of basic understanding of how WordPress works (comments, the “loop”, etc.) and intermediate or better knowledge of style sheets. A newbie to WordPress or someone not at least with working familiarity of PHP and CSS would be pretty challenged to start tearing into a theme like underscores. A good starting point, for example, is WordPress’ own theme tutorial.

Nonetheless, with a modicum of that knowledge, underscores is structured well and it is fairly easy to discern the patterns and design approach. Basic structural building blocks such as headers, footers, pages, comments, etc., can be extended via their own templates by adding the underscore convention (for example, header.php extended to header_myvariation.php). Most of these specific variations occur around different types of “pages” within WordPress.

For example, my AI3 blog has its own search form, and has special sections for things like the listing of 1000 semantic technology Sweet Tools or the timeline of information history or chronology of all articles or acronyms. These sections of the site are styled slightly differently, and I wanted to maintain those distinctions.

So, one of the first efforts of the update was to add these template extensions to the baseline underscores, including attention to building block components and variants such as header and footer. (The actual first effort made was to decide on the basic layout, which I chose a uniform two-column design, rather than the mixed two- and three-column design of my predecessor. Underscores supports a variety of layouts, and may also be integrated with even more flexible grid systems, something I did not do.) The development of template extensions requires a good understanding of the WordPress theme hierarchy.

WordPress Template Hierarchy

Upon getting these structural modifications in place, the next step was to migrate my prior styles to the new framework. I did this basically by “overloading” the underscores style.css with my variants (style-ai3.css) and loading that style sheet after.

There is much toing-and-froing that occurs when making such changes. A modification is made, the file is updated, and the site page at hand is then re-loaded and inspected. These design iterations can take much time and many tweaks.

Thus, in order to not disrupt my public site while these re-design efforts were underway, I did all development locally on my Windows desktop using the XAMPP package. XAMPP allows all standard Linux components of the tradititional LAMP stack to be installed on Windows locally (as well as other desktop systems). I had used XAMPP many years back. It has really evolved into a slick, well thought-out package that is easy to install and configure on Windows. In fact, one of the reasons I had drug my feet about starting the re-design effort was my memory of the challenges of getting a local version running during development. The updated XAMPP package completely removed these concerns.

Also, I made an unfortunate choice during this local development phase. Not knowing if my migration was going to be satisfactory in the end, I created a parallel directory for my new theme, ai3v2, and kept the original directory of ai3 should I need to return to it. This understandable choice led to another issue later, as I explain below.

Upon getting the development site looking and (mostly) operating the way the original one did, I was then able to upload the new theme to the live site and complete the migration process.

Getting the Site Operational Again

Though my new site did come up at “first twitch” using the redesign, once transferred to the live server a number of issues clearly remained. As I began turning on prior plugins and inspecting the new site, I was also seeing problems sporadically appearing or disappearing. Moreover, once I began viewing my site in other browsers, other issues appeared. There were, apparently, a cascade of issues facing the new site that needed to be tackled in an orderly manner to get at the underlying problems.

The difference between what I was seeing in my standard browser versus test browsers first indicated there were caching problems between my new site and the viewing (browser) device. I was not initiallly seeing some of these because key site objects such as images and style sheets and even JavaScript had been cached. (There are also differences in templates and caching when viewing as an administrator versus a standard public user.)

The best way to isolate such effects is to use a proxy server. There are some online services that provide this, mostly used for anonymous surfing, but which can also be used to test caching and browser effects. To see what new public users would see with my new site I used the Zend2.com anonymous online proxy.

The issues I was seeing included:

  • Missing images
  • Missing layouts and style sheets, and
  • Non-performing JavaScript.

Moreover, while I was seeing some improvement in load averages and demands on my site using the Linux top and sar monitoring tools, I was also not seeing 1-min “load averages” dropping well below 1.00 as they should have. Though load spikes were somewhat better and average loads were somewhat better, too, I was not seeing the kind of load reductions I was hoping for by moving to a more modern theme or updated PHP calls. What the heck was still going on? What really was the root issue?

Tackling the Easy First: Images

The first problem I was seeing of missing images was pretty easy to diagnose. My image files had all been organized centrally under the image subdirectory in my theme, such as https://www.mkbergman.com/wp-content/themes/ai3/images/mkbergman1.jpg (which will now show a 404 error). My change to the directory name to ai3v2 was breaking these internal links. Thus, the first change was pretty straightforward. Using phpMyAdmin, I was able to change the internal database references for files and images to their proper new directory, such as https://www.mkbergman.com/wp-content/themes/ai3v2/images/mkbergman1.jpg. The example SQL for this change is:

UPDATE wp_posts SET post_content = REPLACE (
post_content,
'ai3/',
'ai3v2/');

With just a bit more minor cleanup, all of these erroneous references were updated. However, this approach also had a side effect: Saved URL links by users now pointed to an abandoned subdirectory. That was fixed by adding a re-direct to the site’s .htaccess file:

RedirectMatch ^/wp-content/themes/ai3/(.*)$ https://www.mkbergman.com/wp-content/themes/ai3v2/$1

Yet, though these changes made the site more operational, they still did not address the fundamental performance issue.

A Cascade of Caching

Whenever one tests a Web site for performance using services such as Webtest or Google’s PageSpeed, an inevitable top recommendation is to install some form of caching software. Caching software keeps a copy of Web pages and objects on disk (or memory) such that they can be served statically in response to a Web request, as opposed to being generated dynamically each time a request is made. Caching strategies abound for Web sites, but caches also occur within browsers themselves and in the infrastructure of the Web as a whole (such as the Akamai service). These various caches can be intermixed with CDNs (content delivery networks) where common objects such as files or images are served faster with high availability.

As I tried to turn off various caches and then view changes through the Zend2 anonymous proxy, it became pretty apparent there were many caches in my overall display pathway. Some of the caches, especially those on the server, are also a bit more involved to clear out. (Most server clears involved a SSH root login and sometimes use of a script.) As a measure of what I found in my caching system, here is the cascade I was able to confirm for my site:

server apc WP Super Cache PageSpeed [network cache?] browser cache

apc and the PageSpeed Apache module are particularly difficult to clear on the server. That can also pose difficulties in diagnosing and fixing JavaScript errors and conflicts (see below). In the end, in order to see style and other tweaks immediately, I turned off all caching mechanisms under my control.

What I saw immediately, even before fixing display and style issues, is that the load problems I was seeing with my site completely disappeared. I also saw that — in addition to the immediate improvements — that there were stray files and database remnants from these caches and tests of prior ones scattered across my legacy site. For example, I had tried prior caching modules for WordPress such as WP Total Cache and Quick Cache, old files and data tables for which were still strewn across my system. Clearly, it was time for a spring cleaning!

Cleaning the Augean Stables with a Vengeance

With the realization I had cruft everywhere, I decided to do a thorough scrubbing of the site from code to stylesheets and on to the MySQL data tables. To accompany my new, cleaner theme I wanted to have an entire system that was clean as well.

An Unemotional Look at Plugins

I spent time looking at each of my site’s plugins. While I do this on occasion from a functionality standpoint, this review explicitly considered functionality in relation to code size and performance. Many venerable WordPress plugins have, for example, expanded in scope and complexity over time. If I was only using a portion of the full slate of capabilities for a given plugin, I looked at and tested simpler alternatives. For example, earlier I had abandoned the Sociable plugin for ShareThis as the former got bloated; now ShareThis had become bloated as well. I was able to add four lines of code to my theme to achieve the social service references I wanted without a plugin, without JavaScript, and without reports back to tracking sites.

In all, I eliminated two-thirds of my existing plugins through this cold-blooded assessment. It is not worth listing the before and after of this listing, since the purpose of plugins is to achieve site-specific objectives, and yours will vary. But, it is probably a good idea to take periodic inventory of your WordPress plugins and remove or replace some using performance and bloat as guiding criteria.

Excising Unneeded DB Tables

At the start of this process I had something like 40 or more tables in my MySQL database for the AI3 blog. I was now intent on cleaning out all unneeded data in my database. Some of these tables were prefixed with plugin-specific names; others needed to be looked up and inspected one-by-one.

It is always a nervous time to make changes directly to a database, so I first backed up my DB and then began to eliminate old or outdated tables. The net result was a reduction of about two-thirds, leaving the eleven standard WordPress data tables:

wp_commentmeta
wp_comments
wp_links
wp_options
wp_postmeta
wp_posts
wp_terms
wp_term_relationships
wp_term_taxonomy
wp_usermeta
wp_users

and, another five due to the Relevanssi search plugin (the use of which I described in an earlier post).

One table that deserves particular attention is wp_options, wherein most plugin and standard WordPress settings are recorded. This table needs to be inspected individually to remove unused options. If there is doubt, do a search on the individual option name; most related to retired plugins will have similar prefixes. As a result, many columns (fields) in that table got removed as well.

Removing Unneeded CSS

An eight-year old style sheet, plus the addition of a generic one for the starting underscores theme shell, suggested there were many unused calls in my style sheets. I checked out a number of online options for detecting unused CSS, but did not like them as either come-ons to purchased services or their degree of obtrusiveness.

As a result, I chose to use CSS Usage, which is a Firefox addon to Firebug. When set in “auto-scan” mode, I then proceeded to view all of the unique page types across my site. When done, I was able to report out a listing of my style sheets, with all unusued selectors marked as UNUSED. For readability purposes, I was able to re-establish a clean, readable CSS file using one of the many online CSS format utilities. From that point, I then proceeded to delete all unused selectors. By keeping a backup, I was able to restore selectors that were unintentionally deleted.

In the process, I was also able to see reuse and generalizations in the CSS, which enabled me to also make direct modifications and improvements. I then proceeded to review and inspect the site and note any final CSS corrections needed.

A Critical Review of JavaScript

Finally, some of the JavaScript portions of my site still experienced conflicts or long-loading times or wrong loading orders. Some of these offenders had been removed during the earlier plugin tests. I still needed to change order and placements, though, for my site’s timeline and for the Structured Dynamics popup tab to get them to work.

Interface Tweaks

Across this entire process I was also inspecting interface components and widgets throughout the site. My prejudice here was to eliminate very occasional uses or complicated layouts or designs that added to site complexity or slower load times. I have also tweaked my masthead code to get better display of the four-column elements across browsers and devices. It is still not perfect, but better than it ever has been across this diversity of uses.

Optimizing Anew

I am still building up my site from these steps. For example, various Web page speed tests have indicated improvements, but also other areas for optimization. Currently AI3‘s speed tests range in the 90 to 92 score range, better than 85% or so of Web sites, despite the fact my blog is content and image “heavy” with multiple posts on the front page. I tried adding back in WP Super Cache, but then removed it after a few days because load resources remained too high. I most recently tried WP Total Cache again, and so far am pleased that load averages have declined while page load times have also decreased.

WP Total Cache is in-your-face for upgrading for pay and self-promotion, and is a total bitch to configure, the same reasons why I abandoned it in the first place. But, it does seem to provide a better combination of response times and server demands more appropriate to a scalable site.

I have continued to look at optimizing image loads and sprites, and am also looking at how external CSS and JS calls can be improved. Though somewhat time consuming, I now feel I have a pretty good grasp on the moving parts. I should be able to diagnose and manage my system with a much greater degree of confidence and effectiveness going forward.

Some High-level Lessons

Open source and modular code is fantastic, but eventually using it without discrimination or occasional monitoring and testing can lead to untoward effects. Lightweight uses may perhaps get by with minimal attention. However, in the case of this blog, with more than 7000 readers, more attention is required.

The abscess that caused this redesign has now gone away. Site performance is again pretty good, and most all components have been looked at with specific attention and diligence. Yet, the assumed root cause of these issues may, in fact, not have been the most important one. Rather than outdated themes or PHP functions, the greatest performance hit on this AI3 site appears to have been unintended and adverse effects from combined caching approaches. So, while it is good that the underlying code and CSS has been updated, it took investigating these issues to turn up the more fundamental importance of caching.

As for next steps, I have learned that these monitoring and focus items are important:

  • Clean DB tables whenever new plugins or options are made to the site
  • Be cognizant of caching residuals and caching conflicts or thrash
  • Still not sure all reconciliations are complete; will continue to turn over other rocks and clean them
  • Probably some mobile and cross-browser display options need further attention, and
  • Ongoing good performance requires constant testing and tweaking.

In many respects, these are simply the lessons that any sysadmin learns as he monitors and controls new applications. The real lesson is if one is to take on the responsibility of one’s own Web site, then all aspects — including system administration and knowledge discovery — need to also be a responsible part of the mix.


[*] To be hummed according to the tune of Bringing in the Sheaves.
[1] Depending on the flavor of Linux (my instance is Ubuntu), this command may differ, or the commands may be placed in .htaccess.

Posted by AI3's author, Mike Bergman Posted on September 2, 2013 at 10:36 pm in Blogs and Blogging, Site-related | Comments (1)
The URI link reference to this post is: https://www.mkbergman.com/1670/cleaning-up-the-cruft/
The URI to trackback this post is: https://www.mkbergman.com/1670/cleaning-up-the-cruft/trackback/
Posted:June 4, 2013

Wondering about WordsEven Our Standard Stuff is Subject to Miscommunication

For some reason, I seem to have been caught in a number of transactions recently where ABSOLUTE precision in communication has been required. One instance involved an insurance policy and when a particular form of coverage becomes active. One instance related to a business communication that led to vendor conflict. One involved the tax authorities — whoops, should perhaps not say more about that one. Others included . . . fill in your own answer.

As someone who prides himself (a bit) on trying to be precise in communications, these circumstances all bring pause. Even casual stuff is liable to miscommunication; one never knows. Precision errors may occur either via the lack of proper breadth or the absence of sufficient depth or the lack of clarity in whatever it is you try to say. Precise communication will never be mastered.

Yet, that being said, we must communicate, so we also need some guidelines. I think, firstly, we must speak our minds when the thought and muse strikes us. Secondly, we should try to sit on that material a bit before we hit Send.

Honest communication from the heart is warranted in all circumstances, though we may change tone due to perceptions of the audience and perceived potential of misinterpretation. Perception often misjudges audiences. Perhaps the only known is that communications with bureaucracies should be entirely factual with no adjectives.

In the end, the question we need to ask of our communications is simple: do we wish to achieve an action? or, do we wish to go on record? The latter is sometimes more satisfying and occasionally is also the most effective for action. It can be cathartic, yes, and that is also sometimes justification to speak truth to power.

But, in most cases, the purpose of communications is to persuade. There needs to be a sensitivity to tone, language and empathy. Because most of our communications are attempts to persuade and not rants, it is clear why so often our communications fail: it is frightfully hard — in the end, near impossible — to walk in someone else’s shoes.

Posted by AI3's author, Mike Bergman Posted on June 4, 2013 at 11:17 pm in Site-related | Comments (0)
The URI link reference to this post is: https://www.mkbergman.com/1648/every-word-gets-parsed/
The URI to trackback this post is: https://www.mkbergman.com/1648/every-word-gets-parsed/trackback/
Posted:May 21, 2013

Neighbourhoods of Winnipeg - NOWFirst and Largest Local Government Site to Exclusively Embrace Semantic Technologies

The City of Winnipeg, the capital and largest city of Manitoba, Canada, just released its “NOW” portal celebrating its diverse and historical 236 neighborhoods. The NOW portal is one of the largest releases of open data by a local government to date, with some 57 varied datasets now available ranging from local neighborhood amenities such as pools and recreation centers, to detailed real estate and economic development information. Nearly one-half million individual Web pages comprise the site, driven exclusively by semantic technologies. Nearly 10 million RDF triples underly the site.

In announcing the site, Winnipeg Mayor Sam Katz said, “We want to attract new investment to the city and, at the same time, ensure that Winnipeg remains healthy and viable for existing businesses to thrive and grow.” He added, “The new web portal, Neighbourhoods of Winnipeg—or NOW—is one way that we are making it easy to do business within the City of Winnipeg.”

NOW provides a single point of access for information such as location of schools and libraries, Census and demographic information, historical data and mapping information. A new Economic Development feature included in the portal was developed in partnership with Economic Development Winnipeg Inc. (EDW) and Winnipeg REALTORS®.

Our company, Structured Dynamics, was the lead contractor for the effort. An intro to the technical details powering the Winnipeg site is provided in the complementary blog post by SD’s chief technologist, Fred Giasson. These intro announcements by SD will be later followed by more detailed discussions on relevant NOW portal topics in the coming weeks.

Background and Formal Release

But the NOW story is really one of municipal innovation and a demonstration of what a staff of city employees can accomplish when given the right tools and frameworks. SD’s real pleasure over the past two years of development and data conversion for this site has been our role as consultants and advisors as the City itself converted the data and worked the tools. The City of Winnipeg NOW (Neighbourhoods of Winnipeg) site is testament to the ability of semantic technologies to be learned and effectively used and deployed by subject matter professionals from any venue.

In announcing the site on May 13, Mayor Sam Katz also released a short four-minute introductory video about the site:

What we find most exciting about this site is how our open source Open Semantic Framework can be adopted to cutting-edge municipal open data and community-oriented portals. Without any semantic technology background at the start of the project, the City has demonstrated its ability to manage, master and then tailor the OSF framework to its specific purposes.

Key Emphases

As its name implies, the entire thrust of the Winnipeg portal is on its varied and historical neighborhoods. The NOW portal itself is divided into seven major site sections with 2,245 static pages and a further 425,000 record-oriented pages. The number of dynamic pages that may be generated from the site given various filtering or slicing-and-dicing choices is essentially infinite.

Neighborhoods

The fulcrum around which all data is organized on the NOW portal are the 236 neighborhoods within the City of Winnipeg, organized into 14 community areas, 15 political wards, and 23 neighborhood clusters. These neighborhood references link to thousands of City of Winnipeg and external sites, as well as have many descriptive pages of their own.

Some 57 different datasets contribute the information to the site, some authored specifically for the NOW portal with others migrated from legacy City databases. Coverage ranges from parks, schools, recreational and sports facilities, and zoning, to libraries, bus routes, police stations, day care facilities, community gardens and more. More than 1,400 attributes characterize this data, all of which may be used for filtering or slicing the data.

Property and Economic Development

A key aspect of the site is its real estate, assessment and zoning information. Every address and parcel in the city — a count nearing 190,000 in the current portal — may be looked up and related to its local and neighborhood amenities. Up to three areas of the City may be mapped and compared to one another, felt to be a useful tool for screening economic development potentials.

Census Data

All of the neighborhood and neighborhood clusters may be investigated and compared for Census data in two time periods (2001 and 2006). Types of Census informaton includes population, education, labor and work, transportation, education, languages, income, minorities and immigration, religion, marital status, and other family and household measures.

Any and all neighborhoods may be compared to one another on any or all of these measures, with results available in chart, table or export form.

Images and History

Images and history pages are provided for each Winnipeg neighborhood.

Mapping

Throughout, there are rich mapping options that can be sliced and displayed on any of these dimensions of locality or type of information or attribute.

More to Come!

The basic dataset authoring framework will enable City staff (and, perhaps, external parties or citizens) to add additional datasets to the portal over time.

Key Functionality and Statistics

The NOW site is rich in functionality and display and visualization options. Some of this functionality includes the:

NOW Ontology Graph

NOW Graph Structure

NOW is entirely an ontology-driven site, with both domain and administrative ontologies guiding all aspects of search, retrieval and organization. There are 12 domain ontologies govering the site, two of which are specific to NOW (the NOW ontology and a Canadian Census ontology). Ten external ontologies (such as FOAF, GeoNames, etc) are also used.

The NOW ontology, shown to the left, has more than 2500 subject concepts within it covering all aspects of municipal governance and specific Winnipeg factors.

Relation Browser

All of the 2500 linked concepts in the NOW ontology graph can be interactively explored and navigated via the relation browser. The central “bubble” also presents related, linked information such as images, Census data, descriptive material and the like. As adjacent “bubbles” are clicked, the user can navigate or “swim through” the NOW graph.

NOW Relation Browser

NOW Web Maps

Web Map

Nearly all of the information on the NOW site — or about 420,000 records — contains geolocational information of one form or another. There are about 200,000 points of interest records, another 200,000 area or polygon records, and about 7,000 paths and routes such as bus routes in the system.

All 190,000 property addresses in Winnipeg may be looked up and mapped.

Virtually all of the 57 datasets in the system may be filtered by category or type or attribute. This information can be filtered or searched using about 1400 different facets, singly or in combination with one another.

Various map perspectives are provided from facilities (schools, parks, etc.) to economic development and history, transportation routes and bus stops, and property, real estate and zoning records.

Templates

Depending on the type of object at hand, one of more than 50 templates may be invoked to govern the display of its record information. These templates are selected contextually from the ontology and present different layouts of map, image, record attribute or other information, all keyed by the governing type.

Each template is thus geared to present relevant information for the type of object at hand, in a layout specific to that object.

Objects lacking their own specific templates default to the display type of their parent or grandparent objects such that no object type lacks a display format.

Multiple templates are displayed on search pages, depending upon the diversity of object types returned by the given search.

Example of a NOW Record Template

Example of a NOW Census Chart

Graph Statistics

The NOW site provides a rich set of Census statistics by neighborhood or community area for comparison purposes. The nearly half million data points may be compared between neighborhoods (make sure and pick more than one) in graph form (shown) or in tabular form (not shown).

Census information spans from demographics and income to health, schooling and other measures of community well-being.

Like all other displays, the selected results can also be exported as open data (see below).

Image Gallery

The NOW portal presently has about 2700 images on site organized by object type, neighborhood, and historical. These images are contextually available in multiple locations throughout the site.

The History topic section also matches these images to historical neighborhood narratives.

Example of a NOW Image Gallery

Example conStruct Tool: structOntology

conStruct Tools

A series of twenty or so back office tools are available to City of Winnipeg staff to grow, manage and otherwise maintain the portal. Some of these tools are exposed in read-only form to the general public (see Geeky Tools next).

The example at left is the structOntology tool for managing the various ontologies on the site.

Geeky Tools

As a means to show what happens behind the scenes, the Geeky Tools section presents a number of the back office tools in read-only form. These are also good ways to see the semantic technologies in action.

The Geeky Tools section provides access to Search, Browse, Ontology, and Export (see next) tools.

NOW's Geeky Tools

The NOW Export Function

Open Data Exports

On virtually any display or after any filter selection, there is an “export” button that allows the active data to be exported in a variety of formats. Under Geeky Tools it is also possible to export whole datasets or slices of them. Some of the key formats include:

Some of these are serializations that are not standard ones for RDF, but follow a notation that retains the unique RDF aspects.

Some Early Lessons

Though the technical aspects of the NOW site have been ready for quite some time, with limited staff and budget it took City staff some time to convert all of its starting datasets and to learn how to develop and manage the site on its own. As a result, some of the design decisions made a couple of years back now appear a bit dated.

For example, the host content management system is Drupal 6, though Drupal 8 is getting close to its own release. Similarly, some of the display widgets are based on Flash, which Adobe announced last year it will continue to maintain, but will no longer develop. In the two years since design decisions were originally made, the importance of mobile apps and smartphones and tablets has also grown tremendously in importance.

These kinds of upgrades are a constant in the technology world, and apply to NOW as well. Fortunately, the underlying basis of the entire portal in its data and stack were architected to enable eventual upgrades.

Another key aspect of the site will be the degree to which external parties contribute additional data. It would be nice, for example, to see the site incorporate events announcements and non-City information on commercial and non-profit services and facilities.

Conclusion

Structured Dynamics is proud about the suitability of our OSF technology stack and is impressed with all the data that is being exposed. Our informal surveys suggest this is the largest open data portal by a major city worldwide to be released to date. It is certainly the first to be powered exclusively by semantic technologies.

Yet, despite those impressive claims, we submit that the real achievement of this project is something different. The fact that this entire portal is fully maintained and operated by the City’s own internal IT staff is a game changer. The IT staff of the City of Winnipeg had no prior internal semantic Web knowledge, nor any knowledge in RDF, OWL or any other underlying technologies used by the portal. What they had is a vision of their project and what they wanted. They placed significant faith and made a commitment to master the OSF technology stack, and the underlying semantic Web concepts and principles to make their vision a reality. Much of SD’s 430+ documents on the OSF TechWiki are a result of this collaborative technology transfer between us and the City.

We are truly grateful that the City of Winnipeg has taken open source and open data quite seriously. In our partnership wth them they have been extremely supportive of what we have done to progress the technology, release it as open source, and then to document our lessons and experiences for other parties to utilize as documented on the TechWiki. The City of Winnipeg truly shows enlightened government at its best. Thank you, especially to our project managers, Kelly Goldstrand and Don Conolly.

Structured Dynamics has long stated its philosophy as, “We are successful when we are no longer needed. We’re extremely pleased and proud that the NOW portal and the City of Winnipeg show this objective is within realistic reach.

Posted:May 15, 2013

So Many QuestionsThinking About the Interstices of the Journey

It actually is a dimmer memory than I would like: the decision to begin a blog eight years ago, nearly to the day ([1]). Since then, for every month and more often many more times per month, I have posted the results of my investigations or ramblings, mostly like clockwork. But, in a creeping realization, I see that I have not posted any new thoughts on this venue for more than two months! Until that hiatus, I had been biologically consistent.

Maybe, as some say, and I don’t disagree, the high-water mark of the blog has passed. Certainly blog activity has dropped dramatically. The emergence of ‘snippet communications’ now appears dominant based on messages and bandwidth. I don’t loathe it, nor fear it, but I find a world dominated by 140 characters and instant babbling mostly a bore.

From a data mining perspective — similar to peristalsis or the wave in a sports stadium — there is worth in the “crowd” coherence/incoherence and spontaneity. We can see the waves, but most are transient. I actually think that broad scrutiny helps promote separating the wheat from chaff. We can expose free radicals to the sanitizing effect of sunlight. Yet these are waves, only very rarely trends, and most generally not truths. That truth stuff is some really slippery stuff.

But, that is really not what is happening for me. (Though I really live to chaw on truth.) Mostly, I just had nothing interesting to say, so there was no reason to blog about it. And, now, as I look at why I broke my disciplined approach to blogging and why it has gone on hiatus, I still am a bit left scratching my head as to why my pontifications stalled.

Two obvious reasons are that our business is doing gangbusters, and it is hard to sneak away from company good-fortune. Another is that my family and children have been joyously demanding.

Yet all of that deflects from the more relevant explanation. The real reason, I think, that I have not been writing more recently actually relates to the circumstance of semantic techologies. Yes, progress is being made, some instances are notable, but the general “semantic web” or “linked data” enterprise is stalled. The narrative for these things — let alone their expression and relevance — needs to change substantially.

I feel we are in the midst of this intense interstice, but the framing perspective for the next discussions have yet to emerge.

The strange thing about that statement is not the basis in semantic technologies, which are now understood and powerful, but the incorporation of these advantages into enterprise practices and environments. In this sense, semantic technologies are now growing up. Their logic and role is clear and explainable, but how they fit into corporate practice with acceptable maintenance costs is still being discovered.