Evolution
AI³
Adaptive Information
Adaptive Innovation
Adaptive Infrastructure
a·dap·tive adj. Showing or having a capacity to make fit for new or special situations; flexible; a successful adjustment.

Blogasbörd (cloud version):
Send Email   Get SIOC Profile   Get FOAF Profile   Syndicate full contents for this site using RSS 20
Main Links
Categories
Calendar
February 2013
S M T W T F S
« Jan    
 12
3456789
10111213141516
17181920212223
2425262728  
Archives
More . . .  
Credits
Blog software courtesy of WordPress Site Meter View Mike's profile on LinkedIn
6251
Search
Date:   January 28, 2013

The Semantic Enterprise Part 4 in the Enterprise-scale Semantic Systems Series

Text, text everywhere, but no information to link!

For at least a quarter of a century the amount of information within an enterprise embedded in text documents has been understood to be on the order of 80%; more recent estimates put that contribution at 90%. But, whatever the number, or no matter how you slice it, the percentage of information in documents has been overwhelming for enterprises.

The first documentation systems, Documentum being a notable pioneer, helped keep track of versions and characterized its document stores with some rather crude metadata. As document management systems evolved — and enterprise search became a go-to application in its own right — full-text indexing and search was added to characterize the document store. Search allowed better access and retrieval of those documents, but still kept documents as a separate information store from the true first citizens of information in enterprises — structured databases.

That is now changing — and fast. Particularly with semantic technologies, it is now possible to “tag” or characterize documents not only in terms of administrative and manually assigned tags, but with concepts and terminology appropriate to the enterprise domain.

Early systems tagged with taxonomies or thesauri of controlled vocabulary specific to the domain. Larger enterprises also often employ MDM (master data management) to help ensure that these vocabularies are germane across the enterprise. Yet, even still, such systems rarely interoperate with the enterprises’ structured data assets.

Semantic technologies offer a huge leverage point to bridge these gaps. Being able to incorporate text as a first-class citizen into the enterprise’s knowledge base is a major rationale for semantic technologies.

Explaining the Basis

Let’s start with a couple of semantic givens. First, as I have explained many times on this blog, ontologies — that is, knowledge graphs — can capture the rich relationships between things for any given domain. Second, this structure can be more fully expressed via expanded synonyms, acronyms, alternative terms, alternative spellings and misspellings, all in multiple languages, to describe the concepts and things represented in this graph (a construct we have called “semsets“.) That means that different people talking about the same thing with different terminology can communicate. This capability is an outcome from following SKOS-based best practices in ontology construction.

Then, we take these two semantic givens and stir in two further ingredients from NLP. We first prepare the unstructured document text with parsing and other standard text processing. These steps are also a precursor to search; they provide the means for natural language processing to obtain the “chunks” of information in documents as structured data. Then, using the ontologies with their expanded SKOS labels, we add the next ingredient of OBIE (ontology-based information extraction) to automatically “tag” candidate items in the source text.

Editors are presented these candidates to accept or not, plus to add others, in review interfaces as part of the workflow. The result is the final subject “tags” assignment. Because it is important to tag both subject concepts or named entities in the candidate text, Structured Dynamics calls this approach “scones“. We have reusable structures and common terminology and syntax (irON) as canonical representations of these objects.

Add Conventional Metadata

Of course, not all descriptive information you would want to assign to a document is only what it is about. Much other structural information describing the document goes beyond what it is about.

Some of this information relates to what the document is: its size, its format, its encoding. Some of this information relates to provenance: who wrote it? who published it? when? when was it revised? And, some of this information relates to other descriptive relationships: where to download it? a picture of it; other formats of it. Of course, any additional information useful to describe the document can be also tagged on at this point.

This latter category is quite familiar to enterprise information architects. These metadata characterizations have been what is common for standard document management systems reaching back for three decades or more now.

So, naturally, this information has proven the test of time and also must have a pathway for getting assigned to documents. What is different is that all of this information can now be linked into a coherent knowledge graph of the domain.

Some Interface and Workflow Considerations

What we are seeking is a framework and workflow that naturally allows all exisitng and new documents to be presented through a pipeline that extends from authoring and review to metadata assignments. This workflow and the user interface screens associated with it are the more difficult aspects of the challenge. It is relatively straightforward to configure and set up a tagger (though, of course, better accuracy and suitability of the candidate tags can speed overall processing time). Making final assignments for subject tags from the candidates and then ensuring all other metadata are properly assigned can be either eased or impeded by the actual workflows and interfaces.

The trick to such semi-automatic processes is to get these steps right. There are the needs for manual overrides when the suggested, candidate tags are not right. Sometimes new terms and semset entries are found when reviewing the processed documents; these need to be entered and then placed into the overall domain graph structure as discovered. The process of working through steps on the tag processing screens should be natural and logical. Some activities benefit from very focused, bespoke functionality, rather than calling up a complicated or comprehensive app.

In enterprise settings these steps need to be recorded, subject to reviews and approvals, and with auditing capabilities should anything go awry. This means there needs to be a workflow engine underneath the entire system, recording steps and approvals and enabling things to be picked up at any intermediate, suspended point. These support requirements tend to be unique to each enterprise; thus, an underlying workflow system that can be readily modified and tailored — perhaps through scripting or configuration interfaces — is favored. Since Drupal is our standard content and user interface framework, we tend to favor workflow engines like State Machine over more narrow, out-of-the-box setups such as the Workflow module.

These screens and workflows are not integral to the actual semantic framework that governs tagging, but are essential complements to it. It is but another example of how the semantic technologies in an enterprise need to be embedded and integrated into a non-semantic environment (see the prior architecture piece in this series).

But, Also Some Caveats

Yet, what we have described above is the technology and process of assigning structured information to documents so that they can interoperate with other data in the enterprise. Once linked into the domain’s knowledge graph and once characterized by the standard descriptive metadata, there is now the ability to search, slice, filter, navigate or discover text content just as if it were structured data. The semantic graph is the enabler of this integration.

Thus, the entire ability of this system to work derives from the graph structure itself. Creating, populating and maintaining these graph structures can be accomplished by users and subject matter experts from within the enterprise, but that requires new training and new skills. It is impossible to realize the benefits of semantic technologies without knowledgeable editors to maintain these structures. Because of its importance, a later part in this series deals directly with ontology management.

While ontology development and management are activities that do not require programming skills or any particular degrees, they do not happen by magic. Concepts need to be taught; tools need to be mastered; and responsibiilties need to be assigned and overseen to ensure the enterprise’s needs are being met. It is exciting to see text become a first-class information citizen in the enterprise, but like any purposeful human activity, success ultimately depends on the people involved.

NOTE: This is part of an ongoing series on enterprise-scale semantic systems (ESSS), which has its own category on this blog. Simply click on that category link to see other articles in this series.

Posted by AI3's author, Mike Bergman

Posted on January 28, 2013 at 8:38 am in Document Assets, Enterprise-scale Semantic Systems, Information Automation, Linked Data, Open Semantic Framework, Searching, Semantic Enterprise, Structured Dynamics, Structured Web | Comments (0)
The URI link reference to this post is: http://www.mkbergman.com/1612/making-text-a-first-class-citizen/
The URI to trackback this post is: http://www.mkbergman.com/1612/making-text-a-first-class-citizen/trackback/
Date:   January 20, 2013

The Semantic Enterprise Part 3 in the Enterprise-scale Semantic Systems Series

The interests of enterprise architects and semantic technologists do not align. An enterprise architect has the viewpoint of the enterprise and its full breadth of IT requirements, from security to access to content and maintainability (all of which needs to be justified to non-IT managers). The semantic technologist tends to view his entire world through the lens of semantic technologies.

If one is a resident within the semantic technology community, more often than not today’s assessment is that semantics have yet to be successful. If the deployment somehow does not have semantic technologies front and center, then it is largely invisible. The fact that semantic technologies are the core enablers from initiatives ranging from Siri to Pandora to Google and recommendation engines is not embraced and credited: the semantic contribution is hidden.

If one is an enterprise architect, the primacy of whether semantic technologies are in play or not is a non-issue. There are many piece parts to be fulfilled; the system and overall architecture are the concern, not any individual component. The architecture must be broken apart, with the assessment of the suitability of any individual component not based solely on its standalone capabilities, but also as part of an inter-operating whole.

Semantic technology has generally not penetrated well into the enterprise (though it sometimes has in some of the consumer plays as noted above) because its advocates (and, therefore, deployers) have not understood its role. Sometimes semantic technologies are visible, but, more often than not, they are not. The natural role of semantic technologies is in content and schema mediation, functions which reside generally at the repository level and not that of the user.

Two rending forces arise from the wrongful perception that somehow semantic technologies must be evident. The first dissonance is that semantic advocates are often indiscriminate in where they focus their advocacies. While semantic approaches can, theoretically, be applied from the user content management level to applications, these are neither the pain points nor the focus of enterprise architects. EAs are interested in semantic technologies for content integration and interoperability, as often evidenced by superior search, not other uses. The second dissonance is that, not recognizing its natural role, semantic technologists are not paying attention to making their capabilities inter-operable with the rest of the enterprise stack.

Actual enterprise deployments have a rhythm and hierarchy of scrutiny and decision-making. For semantics to become an integral contributor to enterprise solutions, it is important to recognize where this function can fit today. There should be no arrogance in this discussion whatsoever. Like a Galileo thermometer, it is important to find the natural resting point for semantic contributions . . . .

A Basic Architecture

As other discussions by Fred Giasson and I have put forward, the nature of our (Structured Dynamics) semantic stack, what we call the open semantic framework (OSF), has Drupal as its resident content management piece, with Virtuoso the RDF triple store, and many additional open source parts. Our TechWiki explains more of that in detail.

The OSF architecture, though, is generalized enough such that these two components, or any of the other open source pieces in the stack, could be swapped out for others. It is the Web service glue underneath OSF, SD’s structWSF framework, that is the real enabler of the entire semantic stack.

Yet when one is done with design of an enterprise architecture, the actual semantic portion (shown in green below) becomes itself a mere component, all embedded into the full suite of enterprise requirements. This illustrative architecture, generalized across clients, again uses Drupal as the content management framework, with the new service being hosted in the cloud:

Overall Enterprise-scale Semantic Architecture

We see that a security component now governs all interactions. Middleware has been inserted into the standard OSF stack (Drupal + semantic services) and now takes over the functions of logging, messenging, an enterprise service bus, security, and version control and data governance things. All of our hardware, network services, and Web servers are provided in the cloud. We also need to conform to existing content and data sources and the means to harvest or get updates from them.

The semantic component — OSF in our case — has in effect been surrounded by existing or external sources and services. The semantic management responsibility resides at the core of this architecture, thus making the content repository very important. But, in order for the repository to perform its work, it must interface with all of these existing and required systems. In order for semantics to make enterprise contributions, it must become, in effect, a “hidden” or “buried” service.

When targeting enterprise customers, this role for semantic technologies is a reality. For systems to be adopted, which is the first step to being effective, it is helpful to warmly embrace that your installation will be as much involved with interfaces and external sources and systems as much as semantics alone. Embracing this viewpoint means you are being adopted.

This reality does place a premium on a Web service architecture for the semantic stack. All endpoints can be communicated with via HTTP, and all endpoints have a common and published API. Each re-factoring stresses making the interfaces distinct and clean, and embracing common syntax and protocols for communicating with the endpoints.

The Natural Resting Place for Semantic Technologies

We can expand the green portion of the diagram above — the semantic components or what corresponds to OSF — and show them in more detail, as in the next architecture diagram below. We are now enumerating the Web services in the stack, and are showing the interaction with datasets (important for the security aspects, which a later installment of this series will address). The various engines that power the OSF stack are shown at bottom:

OSF Portion of the Enterprise-scale Semantic Architecture

While it is true that the semantic components are “buried” within all portions of the enterprise stack, we can ease the integration challenge by narrowing the interface points to the non-semantic portions. At the top level, in the interaction with the content management framework (Drupal in our case), we have aggregated all Web service interface calls and made them available via a programmatic API via the structWSF PHP API. (Multiples of these can be developed if the programmatic interfaces need to be in languages other than PHP, such as Java.)

The structWSF API provides a consolidated point for writing endpoint calls and queries using PHP. This not only makes it more efficient for developing endpoint connectors (whose purpose is to enable Drupal methods and modules for interfacing with the repository), but also provides a common API and methods. Though it is possible to issue queries directly to any structWSF Web service endpoint, the structWSF API module is a faster and more consistent interface for doing so. This consolidation also means that developers interacting with the semantic components need only worry about the dedicated API module, and not the code or location in the more than 20 individual endpoints.

A similar philosophy is applied to narrowing the security interfaces. We treat security as a black box. Granting access and rights is proxied at the middleware layer. If these rights are granted, the query payload is presented to the Auth:Validator endpoint via a registered security gateway IP. The verification of the IP by Auth:Validator enables the query to be submitted, with a results set also returned via the same pathway.

Three design mindsets govern this architectural design. First, interface points are narrowed and standardized, generally with a formal API. Second, important external services are treated as “black boxes”; how they do their work is immaterial. Only vetted requests and calls approved at these other layers are able or authorized to access the services at the semantic layer. And, third, we are not trying to embrace non-semantic functionality at the semantic layer. These important services — but ancillary ones from a semantic standpoint — are understood as being out of scope to the semantic requirements. This design also makes it easier to “plug” the semantic components into other enterprise stack configurations with other non-semantic services from other sources or vendors.

Some Development Gaps and Imperatives

This design makes sense from a theoretical standpoint, but can pose problems in practice.

The first challenge is that our OSF approach is based on RESTful Web services, in a true Web-oriented architecture. Many of the non-semantic legacy components were originally designed for formal big WS-* approaches drawn from the SOAP perspective. Though most of these existing interfaces have evolved to embrace RESTful alternatives, these interfaces are not always as well tested and complete as the original WS-* ones. This relative immaturity can pose issues with respect to completeness of parameter or function support or inadequate testing.

A second challenge, also related to a RESTful Web service perspective, is the size of payloads in both query and results set objects. Long HTTP queries with many parameter requests and large results sets can be a problem to handle, especially in the security layer. In some cases, we have had to look at ways to minimize and package (consolidate) parameter options in order to make endpoint requests more efficient.

Encoding mismatches are a further challenge. It is generally best, for example, to adhere to a standard UTF-8 encoding via all semantic component interfaces. This requires attention and coordination on both sides of the interface.

The more fundamental challenge, however, is one of mindset. Effective interfaces require effective communications of the participating vendors across the boundary. The terminology, concepts, logic and open-world approach of semantic technologies are not easily communicated to nor immediately understood by traditional practitioners. The communications must be constantly worked in order to overcome past practices and embrace the flexibilities provided by semantic technologies.

The Mismatch is Not Long-term

But these challenges are more one of degree and practice than anything more fundamental. As semantic components get deployed in an enterprise stack, the benefits of faceting and the underlying structure become apparent. Such awareness propels further understanding and a willingness to learn more about underlying foundations. Ultimately, with a design emphasizing a relatively few, focused interfaces, semantic components can be effectively integrated within enterprise stacks.

The more telling lesson is the understanding of the natural role that semantic technologies play within enterprise-scale systems. Semantic technologies are the natural integration framework for federating and interoperating virtually any and all non-transaction information assets of the enterprise. That places semantic technologies at the core of the enterprise stack, even if it is not terribly evident to all users. The natural role for semantic technologies for the nearest term appears to be in repositories and for content integration.

NOTE: This is part of an ongoing series on enterprise-scale semantic systems (ESSS), which has its own category on this blog. Simply click on that category link to see other articles in this series.

Posted by AI3's author, Mike Bergman

Posted on January 20, 2013 at 10:45 pm in Enterprise-scale Semantic Systems, Linked Data, Open Semantic Framework, Semantic Enterprise, Structured Dynamics, Web-oriented Architecture | Comments (1)
The URI link reference to this post is: http://www.mkbergman.com/1608/architecting-semantic-technologies-for-the-enterprise/
The URI to trackback this post is: http://www.mkbergman.com/1608/architecting-semantic-technologies-for-the-enterprise/trackback/
Date:   January 14, 2013

The Semantic Enterprise Part 2 in the Enterprise-scale Semantic Systems Series

Those involved with the semantic Web are passionate as to why they are involved. This passion and the articulateness behind it are notable factors in why there is indeed a ‘semantic Web community.’ Like few other fields — perhaps including genomics or 3D manufacturing — semantic technologies tend to attract exceptionally smart, committed and passionate people.

Across this spectrum of advocates there are thousands of pages of PDFs and academic treatises as to semantic this or semantic that. There is gold in these hills, and much to mine. But, both in grants and in approaching customers, it always comes down to the questions of: What is the argument for semantic technologies? What are the advantages of a semantic approach? What is the compelling reason for spending time and money on semantics as opposed to alternatives?

Fred Giasson and I at Structured Dynamics feel we have done a pretty fair job of answering these questions. Of course, it is always hard to prove a negative — how do the arguments we make stack up against those we have not? We will never know.

Yet, on the other hand, we have found dedicated customers and steady and growing support from the arguments we do make. At least we know we are not scaring potential customers away. Frankly, we suspect our market arguments are pretty compelling. While we discuss many aspects of semantic technologies in our various writings and communications, we have also tended to continually hone and polish our messages. We keep trying to focus. Fewer points are better than more and points that resonate with the market — that address the “pain points” in common parlance — have the greatest impact.

It is also obvious that the arguments an academic needs to make to a funding agency or commission are much different than what is desired by commercial customers. (Not to mention the US intelligence community, which is the largest — yet silent — funder of semantic technologies.) Much of what one can gain from the literature is more of this academic nature, as are most discussions on mailing lists and community fora. We distinctly do not have the academic perspective. Our viewpoint is that of the enterprise, profit-making or non-profit. Theory takes a back seat to pragmatics when there are real problems to solve.

Our three main selling points to this enterprise market relate to data integration and interoperability; search and discovery; and leveraging existing information assets with low risk. How we paint a compelling picture around these topics is discussed for each point below. We conclude with some thoughts about how and the manner we communicate these arguments, perhaps representing some background that others might find useful in how they may make such arguments themselves.

“Semantic Technologies Enable Data Integration and Interoperability”

As I have experienced first hand and have argued many times [1], the Holy Grail of enterprise information technology over the past thirty years has been achieving true data integration and interoperability. It is, I believe, the primary motivating interest for most all IT efforts not directly related to conventional transaction systems. Yet, because of this longstanding and abiding interest, enterprise IT managers react with justifiable skepticism every time new advances in interoperability are claimed.

The claims for semantic technologies are not an exception. But, even in its positioning, there is something in the descriptive phrasing of “semantic technologies” that resonates with the market. Moreover, to overcome the initial skepticism, we also tend to emphasize two bolstering arguments promoting interoperability:

  1. Semantic technologies matched with natural language (NLP) techniques work to integrate unstructured data, finally incorporating the 80% of enterprise information locked up in documents and overcoming the limitations of manually assigned tags, and
  2. The RDF data model is capable of capturing any existing data relationship, and ontologies are capable of capturing any existing information schema.

Since these are two of the core aspects to data integration and have heretofore been limited with conventional approaches, and since they can be demonstrated rather quickly, trust can be placed into the ultimate interoperability argument.

In the end, the ability of semantic technologies to promote rather complete data integration and interoperability will prove to be its most compelling rationale. Yet, achieving this with semantic technologies will require more time and broader scope than what has been instituted to date. By starting smaller and simpler, a more credible entry argument can be made that also is on the direct pathway to interoperability benefits.

“Semantic Technologies Improve Search and Discovery”

On the face of it, search engines and the search function are nearly ubiquitous. Further, search is generally effective in eventually finding information of interest, though sometimes the process of getting there is lengthy and painful.

This inefficiency results because search has three abiding problems. One, there is too much ambiguity in what kind of thing is being requested; disambiguation to the context at hand is lacking. Second, there is a relative lack of richness in the kinds of relationships between things that are presented. We are learning through Web innovations like Wikipedia or the Google Knowledge Graph that there are many attributes that can be related to the things we search. The natural desire is to now see such relationships in enterprise search as well, including some of this public, external content. And, third, because of these two factors, search is not yet an adequate means for discovering new insights and knowledge. We see the benefits of serendipitous discovery, but we have not yet learned how to do this with purpose or in a repeatable way.

More often than not customers see search, with better display of results, at the heart of the budget rationale for semantic projects. The graph structures of semantic schema means that any node can become an entry point to the knowledge space for discovery. The traversal of information relationships occurs from the selection of predicates or properties that create this graph structure in the first place. This richness of characterization of objects also means we can query or traverse this space in multiple languages or via the full spectrum by which we describe or characterize things. Semantic-based knowledge graphs are potentially an explosion of richness in characterization and how those characterizations get made and referred to by any stakeholder. Search structure need not be preordained by some group of designers or information architects, but can actually be a reflection of its user community. It should not be surprising that search offers the quickest and most visible path to conveying the benefits of semantic technologies.

These arguments, too, are a relatively quick win. We can rapidly put in place these semantic structures that make improved search benefits evident. There are two nice things about this argument. First, it is not necessary to comprehensively capture the full knowledge domain of the customer’s interests to show these benefits. Relatively bounded projects or subsets of the domain are sufficient to show the compelling advantages. And, second, as this initial stakehold gets expanded, the basis for the next argument also becomes evident.

“Semantic Technologies Leverage Existing Assets with Low Risk”

I have often spoken about the incremental nature of how semantic technologies might be adopted and the inherent benefits of the open world mindset. This argument is less straightforward to make since it requires the market to contemplate assumptions they did not even know they had.

But, one thing the market does know is the brittleness and (often) high failure rates of knowledge-based internal IT projects. An explication of these causes of failure can help, via the inverse, to make the case for semantic technologies.

We know (or strongly suspect), for example, that these are typically the causes of knowledge-based IT failures:

  • Too broad a scope or the need to embrace too much of the information basis of the domain
  • Changing knowledge and circumstances that causes initial design imperatives to change over the course of a project
  • High visibility for multiple audiences and stakeholders, and no workable means for finding a common view or consensus as to objectives (let alone terminology) for the project amongst these stakeholders.

Getting recognition for these types of failures or challenges creates the opening for discussing the logic underpinnings of conventional IT approaches. The conventional closed-world approach, which is an artifact of using information systems developed for transaction and accounting purposes, is unsuited to open-ended knowledge purposes. The argument and justification for semantic technologies for knowledge systems is that simple.

The attentive reader will have seen that the first two arguments presented above already reify this open world imperative. The integration argument shows the incorporation of non-structured content as a first-class citizen into the information space. The search argument shows increased scale and richness of relationships as new topics and entities get added to the search function, all without adversely impacting any of the prior work or schema. For both arguments, we have expanded our scope and schema alike without needing to re-architect any of the semantic work that preceded it. This is tangible evidence for the open world argument in the context of semantic technologies applied to knowledge problems.

These evidences, plus the fact we have been increasingly incorporating more sources of information with varied structure, most of which already exists within the enterprise’s information assets, shows that semantic technologies can leverage benefits from existing assets at low risk. At this point, if we have told our story well, it should be evident that the semantic approach can be expanded at whatever pace and scope the enterprise finds beneficial, all without impacting what has been previously implemented.

Actually, the argument that semantic technologies leverage existing assets with low risk is perhaps the most revolutionary of the three. Most prior initiatives in the enterprise knowledge space have required wholesale changes or swapping out of existing systems. The unique contribution of semantic technologies is that they can achieve their benefits as a capability layered over existing assets, all without disruption to their existing systems and infrastructure. The degree to which this layering takes place can be driven solely by available budgets with minimal risk to the enterprise.

Ambassadors and Archivists, as well as Entrepreneurs

There are, of course, other messages than can be made, and we ourselves have made them in other circumstances and articles. The three main arguments listed herein, however, are the ones we feel are most useful at time of early engagement with the customer.

Our messages and arguments gain credibility because we are not just trying to “sell” something. We understand that semantic technologies and the mindsets behind them are not yet commonplace. We need to be ambassadors for our passion and work to explain these salient differences to our potential markets. As later parts in this series will discuss, with semantic technologies, one needs to constantly make the sale.

The best semantic technology vendors understand that market education is a core component to commercial success. Once one gets beyond the initial sale, it is a constant requirement to educate the customer with the next set of nuances, opportunities and technologies.

We acknowledge that vendors have other ways to generate “buzz” and “hotness.” We certainly see the consumer space filled with all sorts of silliness and bad business models, But our pragmatic approach is to back up our messaging with full documentation and market outreach. We write much and contribute much, all of which we document on vehicles such as our blogs, commercial Web site, or TechWiki knowledge base. New market participants need to learn and need to be armed with material and arguments for their own internal constituencies. Insofar as we are the agents making these arguments, we also get perceived as knowledgeable subject matter experts in the semantic technology space.

I have talked in my Of Flagpoles and Fishes article of the challenges of marketing to a nascent market where most early sales prospects remain hidden. At this stage in the market, our best approach is to share and communicate with new market prospects in a credible and helpful way. Then, we hope that some of those seeking more information are also in a position to commission real work. If we are at all instrumental in those early investigations, we are likely to be considered as a potential vendor to fulfill the commercial need.

Of course, each new engagement in the marketplace means new lessons and new applications. Thus, too, it is important that we become archivists as well. We need to capture those lessons and feed them back to the marketplace in a virtuous circle of learning, sharing, and further market expansion. Targeted messages delivered by credible messengers are the keys to unlocking the semantic technologies market.

NOTE: This is part of an ongoing series on enterprise-scale semantic systems (ESSS), which has its own category on this blog. Simply click on that category link to see other articles in this series.

[1] Simply conduct a search on http://www.mkbergman.com/?s=interoperability+integration to see how frequently this topic is a focus of my articles.

Posted by AI3's author, Mike Bergman

Posted on January 14, 2013 at 9:55 am in Enterprise-scale Semantic Systems, Linked Data, Searching, Semantic Enterprise, Structured Dynamics | Comments (1)
The URI link reference to this post is: http://www.mkbergman.com/1605/three-leading-arguments-for-semantic-technologies/
The URI to trackback this post is: http://www.mkbergman.com/1605/three-leading-arguments-for-semantic-technologies/trackback/
Date:   January 8, 2013

The Semantic EnterpriseIntroduction: Part 1 of a New Series

For about the past two years, Fred Giasson and I have had the good fortune to work with some cutting-edge, reference enterprise deployments of semantic technologies. Our work at Structured Dynamics is about to see the light of day from these initiatives, which our sponsors will be unveiling shortly. These efforts in enterprise-scale systems have been eye opening. One eye has been opened with respect to how semantic technologies need to integrate and adapt to existing enterprise practices and deployments. The other eye has been opened with respect to how semantic technologies need to be presented and sold to internal enterprise stakeholders.

Our emerging series on enterprise-scale systems, which this first article introduces, attempts to package and share the lessens we have gained from these enterprise-scale deployments. This series marks a subtle — but substantive — shift from many of my prior writings. Those earlier writings over the past five to six years represent an attempt to introduce and describe some of the key underpinnings of semantic technologies and the mindsets behind them. Probably the best summary of these earlier messages resides in my article on the Seven Pillars of the Open Semantic Enterprise, published nearly three years ago today.

But logic, theory and foundational descriptions can only go so far. Ultimately, if the constructs at the core of these semantic technologies are to be realized, the conceptual needs to be brought down to the practical. Those are the efforts that Fred and I have been pursuing for the past two years, and those are the efforts that this new series on enterprise-scale semantic systems attempts to capture.

A Bit of History

The foundational underpinnings to the semantic Web — upon which semantic enterprise technologies are based [1] — extends back now nearly 12 to 15 years. Predecessor data models to RDF were being described in the late 1990s (actually, even earlier, but not within a Web framework), with the foundational Resource Description Framework approach first promulgated as a standard by the World Wide Web Consortium (W3C) in 1999. The first social semantic Web vocabulary, FOAF, was started in 2000. Schema extensions to RDF and then the Web Ontology Language (OWL) for formalizing semantic Web schema were first published in 2004. Many related efforts in supporting formats followed soon thereafter, including some of the baseline semantic Web vocabularies such as SKOS. This decade or more of effort has now resulted in a rich set of vocabularies, standards and best practices.

Many in the community look to the techniques associated with linked data and the complementary project to make Wikipedia content accessible as structured data via DBpedia as the essential turning point for the semantic Web. The so-called linked open data (LOD) voice has become a prominent one within the community. While we embrace linked data techniques and believe them often to be best practices, our own experience indicates that linked data alone is not a key driver to enterprise adoption of semantic technologies. In our experience, linked data advocacy may be best characterized as neutral to negative in helping to foster enterprise semantic technology adoption.

At a consumer level, efforts from DBpedia to Siri and semantic search and various structured data initiatives are validating the use of semantic technologies as foundational elements to many current information architectures. The use of semantic knowledge graphs and graph-oriented databases (DBs) and structures is, for example, pervasive to many standard Web offerings from Google to Facebook. These adoptions tend to be incremental and subtle; looking at the functional and relational capabilities of major Web properties today in comparison with their same offerings of even a few years ago verifies these transitions.

But, generally, these semantic transitions are subtle and incremental. Semantic technologies are not expressing themselves as revolutionary new “killer apps” or in-your-face differences. Rather, they are background improvements that act to better inform how we find stuff and what relevant information gets presented to us.

Semantic technology advocates often appear to have been caught in a vise of their own making. On the one hand, “revolutionary” improvements in data access and management were promised, the idea that data would have an equivalent impact to the initial adoption of the Web. On the other hand, since such huge data management changes have not been glaringly evident, it is clear that semantic technologies have not achieved that vaunted potential. The advocate’s strawman has not apparently appeared, and can not either be knocked down nor recognized.

The seeming “failure” of semantic technologies to achieve their advocated potential leads both to (sometimes) expressions of despair for why that “failure” has occurred as well as strident arguments for certain community-focused advocacies, such as linked data. From an enterprise perspective, most all of this seems quite parochial and beside the point. From the customer perspective, data integration and usefulness is the driving motivation, not linked data.

The “drivers” for semantic technologies in the enterprise remain the same as they have been for decades: better integration of existing and desired information assets at lower cost and with better insight for business purposes. These are the metrics of interest to enterprise decisionmakers, not the internal advocacy positions of linked data academics. We thus have the strange confluence of the market embracing and accepting semantic relationships within data while advocates perceive a lack of adoption.

All of this is occurring within the backdrop of software development shifting from the past few years of consumer prominence to the re-emergence of enterprise uses. Though stupid and overstated, some have expressed this enterprise shift as a trillion dollar opportunity. For sure, the opportunity is big, but no one believes the incumbent enterprise providers will be overcome easily and much enterprise stuff will be shared from the consumer side of things. But, in any event, the enterprise market opportunity remains compelling.

Some Good News

Like all good market opportunities, there is much positive to discover once one scratches below the initial semantic surface. The same compelling needs for data integration and interoperability that have been a commonplace of the enterprise market for more than three decades remain today. Information use of all kinds within the enterprise sucks, and has now for many decades. The litany of information management failures in the enterprise extends from stovepiped data silos to lack of use of internal unstructured data (documents) and the virtual lack of integration with external information sources. It has taken the massive success of the Web and its distributed model of resources to make clear just how dysfunctional most enterprise information systems truly are.

Though certainly not yet evident to all or even most, it is clear that the promises in the logic, theory and foundational bases of semantic technologies are real and are relevant. The RDF data model works, as does the use of ontologies as governing schema. Natural language processing (NLP) techniques married to RDF have made unstructured documents equal first class citizens with structured data. Open world approaches are showing how schema and integration development can be incremental and cost effective, while overcoming past brittleness in how to organize and manage information. Web-oriented architectures are proving the same benefits to the enterprise as is shown on the broader public Web. These architectures and rather simple connectors or RDFizers are showing how legacy systems and assets can be leveraged in place to transition to a semantic future without undue cost or disruption to existing practices.

Daily we see success of semantic technologies in multiple locations, and the market is coming to understand the uses and potential benefits. The benefits of graph-based knowledge structures in search and recommendation systems are becoming accepted. We see how basic search is being enhanced with entity recognition and characterization, as well as richer links between entities. The ability of the RDF data model and ontologies to act as integration frameworks is no longer an assertion, but a fact. Despite the despair of some semantic advocates, the market place is increasingly understanding the concepts and potential of semantic technologies.

There is real and good money to be made in this marketplace. Fred and I turn away work and our company, Structured Dynamics, has been self-financed and profitable since its inception. Our revenues increase substantially each year and we have significant monies banked to protect us in a downturn or to fund our own initiatives. No one owns any portion of our company and we are not in debt or obligated to follow the directives of any venture firm. Even with our profitable operations, we still offer cheaper and faster ways for enterprises to achieve their information objectives than through conventional means.

Some Realistic News

Semantic technologies have not yet reached the point of fulfilling their own prophecy nor of being sufficiently buzz-worthy to fuel their own demand. Enterprise customers are intrigued with the idea of semantic solutions, but still need to be convinced. Better search is often the telling leverage point in the sale. Enterprises do not appear to be interested in linked data alone (if at all), though some like the idea of possibly contributing linked data back to others. In any event, linked data (at least in our experience) is not a material factor to the sale.

The material factors to a sale have been data integration and interoperability, fulfilled through a distributed Web architecture that is now apparent all around us. Yet, even despite a general positive predilection to semantic technologies, the conceptual and technology transfer barriers to overcome are quite daunting. We (I) pride myself on being able to communicate complicated ideas and concepts relatively simply. But, despite hundreds of pages of documentation and many polished write-ups (see, for example, our TechWiki and the chronology of this blog), semantic concepts are not (generally) intuitive to content editors, information architects, project managers or fellow developers or project vendors. It is absolutely imperative to engage in continuous training and knowledge transfer during a semantic deployment.

These imperatives increase when multiple parties and components are being brought to bear for a large-scale enterprise deployment. Each part of the puzzle — from portal and content management system to middleware to security to information repository — has its own lingo and concepts for quite similar things. Because of the central role of semantics to these integration problems, it is critical that concepts in all legacy areas be properly “mapped” to the terminology and concepts of the semantic solution. One component’s ‘entity‘ is another component’s ‘instance‘; one component’s ‘schema‘ is another component’s ‘ontology‘.

Inter-team communications must be grounded in shared vocabulary and concepts. Yet, even then, it is still necessary to continuously describe and explicate the benefits due to semantic approaches over conventional ones. Because of its general foundational nature, semantic approaches are often hidden or at the core of the information solution. It is not always self-evident what the advantages of semantic approaches are, because their results can be mimicked via conventional approaches (though at greater cost with greater brittleness).

In no instance are we aware of enterprises having much interest in public data, except as judicious supplements. Most all information challenges are based on private, internal data, with much concern over security and access. Where public data enters the equation, it is from very limited sources of excellent quality and provenance. Thus, information solutions geared to the enterprise must have security and differential access baked into the cake, and not be an afterthought. In this regard, the semantic enterprise is quite unlike the semantic Web. Interoperability, data quality and data reliability take huge precedence over such ideas as serendipity or follow your nose, advantages often put forward in a public Web context.

Unlike just a few years back, we no longer see resistance to open source solutions. In fact, for early semantic adopters, open source is a positive feature. But with open source in a complicated enterprise environment comes its own challenges. Support is often poor and integrating the pieces becomes one of the key project responsibilities and risks. Simple assertions of open APIs and a commitment to Web service endpoints still can lead to significant integration challenges. Encoding mismatches or how error messages get generated or treated, as two examples, point to some of the challenges in creating an integrated enterprise environment from multiple open source pieces.

Though enterprise funding sure beats the funding behind most consumer-oriented projects, enterprise IT budgets have also come under their own pressures. The justification for many projects resides in being to offset annual licensing and maintenance fees, which can impose delivery constraints based on renewal dates. Existing enterprise IT budgets have also been made more incremental, with milestone achievements often required for moving forward. These trends are putting a premium on agile development and the need for enterprise-scale deployment and testing tools. Repeatable build processes and scripts are an essential component now of complicated stack deployments.

Many of the issues that emerge in enteprise deployments are ancillary to or independent of specific semantic components. Logging, testing, security, access, service buses and deployment builds are an umbrella over entire deployments. In these regards, incorporation of semantic technologies means that these contributions, too, must adhere to enterprise build practices and standards. This works to put a premium on repeatable build and testing scripts and improved deployment documentation and practices.

What all this means is that semantic technologies and practices need to grow up to adhere to standard enterprise practices, which themselves are undergoing rapid change as incremental, agile development becomes more prevalent. Much of what SD has learned in the past two years relates to the development and deployment environments that both aid and govern modern enterprise IT projects. In these regards, semantic technologies are merely another set of components in a broader, enterprise-wide stack.

Lastly, another reality of semantic technologies in the enterprise is that there are precious few champions and advocates within any given enterprise. Means must be found to communicate to semantic newbies and to enlist the aid of these champions in carrying the message forward within their organizations. In multi-vendor deployment environments it is important to find single points of contact that can also help communicate with their colleagues.

What is to Come

These general points set the context for some of the specifics in the series to come. Attention will be given in the series to a number of topics, not necessarily in this order nor scope:

  • Three leading for semantic technologies
  • Architecting semantic technologies for the enterprise
  • Making text a first-class citizen
  • The primacy of search
  • Access control using datasets
  • Adopting and use of comprehensive deployment and testing environments (e.g., JIRA, Confluence, Bamboo)
  • Harvesting and ETL considerations
  • Authoring
  • Workflow integration and differences posed by semantic technologies
  • Security considerations with semantic technologies
  • Automated testing
  • Continuous integration
  • Working with a Web-oriented architecture and endpoints
  • Multiple, modular ontologies to capture enterprise schema
  • Tech transfer and tools for ontology growth and development
  • Integrating semantic technologies into publishing platforms
  • Version control for semantic data and ontologies
  • Gaps in enterprise-readiness of semantic technologies (bespoke tools, etc.)
  • Challenges in communicating the benefits of semantic technologies
  • Broader challenges in adoption of semantic technologies.

As appropriate, these topics will be addressed in forthcoming installments in this series. We will be culminating this series with overviews of two enterprise initiatives with high visibility for which Structured Dynamics has been the lead semantics contractor.

NOTE: This is the first part in an ongoing series on enterprise-scale semantic systems (ESSS), which has its own category on this blog. Simply click on that category link to see other articles in this series.

[1] With the growing popularity of semantic technologies, many entities claim the “semantic” mantle based solely on ways to negotiate the meaning (“semantics”) of various concepts. This effort is legitimate, but tends to undercut a more strict interpretation. As we use herein, and have for some time, semantic technologies means the explicit use of the semantic Web languages and specifications adopted by the W3C along with the various software applications to use them.

Posted by AI3's author, Mike Bergman

Posted on January 8, 2013 at 10:45 am in Enterprise-scale Semantic Systems, Linked Data, Semantic Enterprise, Structured Dynamics | Comments (0)
The URI link reference to this post is: http://www.mkbergman.com/1037/enterprise-scale-semantic-systems/
The URI to trackback this post is: http://www.mkbergman.com/1037/enterprise-scale-semantic-systems/trackback/
Date:   July 9, 2012
Abrogans; earliest glossary (from Wikipedia)

There are many semantic technology terms relevant to the context of a semantic technology installation [1]. Some of these are general terms related to language standards, as well as to  ontologies or the dataset concept.

ABox
An ABox (for assertions, the basis for A in ABox) is an “assertion component”; that is, a fact associated with a terminological vocabulary within a knowledge base. ABox are TBox-compliant statements about instances belonging to the concept of an ontology.
Adaptive ontology
An adaptive ontology is a conventional knowledge representational ontology that has added to it a number of specific best practices, including modeling the ABox and TBox constructs separately; information that relates specific types to different and appropriate display templates or visualization components; use of preferred labels for user interfaces, as well as alternative labels and hidden labels; defined concepts; and a design that adheres to the open world assumption.
Administrative ontology
Administrative ontologies govern internal application use and user interface interactions.
Annotation
An annotation, specifically as an annotation property, is a way to provide metadata or to describe vocabularies and properties used within an ontology. Annotations do not participate in reasoning or coherency testing for ontologies.
Atom
The name Atom applies to a pair of related standards. The Atom Syndication Format is an XML language used for web feeds, while the Atom Publishing Protocol (APP for short) is a simple HTTP-based protocol for creating and updating Web resources.
Attributes
These are the aspects, properties, features, characteristics, or parameters that objects (and classes) may have. They are the descriptive characteristics of a thing. Key-value pairs match an attribute with a value; the value may be a reference to another object, an actual value or a descriptive label or string. In an RDF statement, an attribute is expressed as a property (or predicate or relation). In intensional logic, all attributes or characteristics of similarly classifiable items define the membership in that set.
Axiom
An axiom is a premise or starting point of reasoning. In an ontology, each statement (assertion) is an axiom.
Binding
Binding is the creation of a simple reference to something that is larger and more complicated and used frequently. The simple reference can be used instead of having to repeat the larger thing.
Class
A class is a collection of sets or instances (or sometimes other mathematical objects) which can be unambiguously defined by a property that all of its members share. In ontologies, classes may also be known as sets, collections, concepts, types of objects, or kinds of things.
Closed World Assumption
CWA is the presumption that what is not currently known to be true, is false. CWA also has a logical formalization. CWA is the most common logic applied to relational database systems, and is particularly useful for transaction-type systems. In knowledge management, the closed world assumption is used in at least two situations: 1) when the knowledge base is known to be complete (e.g., a corporate database containing records for every employee), and 2) when the knowledge base is known to be incomplete but a “best” definite answer must be derived from incomplete information. See contrast to the open world assumption.
Data Space
A data space may be personal, collective or topical, and is a virtual “container” for related information irrespective of storage location, schema or structure.
Dataset
An aggregation of similar kinds of things or items, mostly comprised of instance records.
DBpedia
A project that extracts structured content from Wikipedia, and then makes that data available as linked data. There are millions of entities characterized by DBpedia in this way. As such, DBpedia is one of the largest — and most central — hubs for linked data on the Web.
DOAP
DOAP (Description Of A Project) is an RDF schema and XML vocabulary to describe open-source projects.
Description logics
Description logics and their semantics traditionally split concepts and their relationships from the different treatment of instances and their attributes and roles, expressed as fact assertions. The concept split is known as the TBox and represents the schema or taxonomy of the domain at hand. The TBox is the structural and intensional component of conceptual relationships. The second split of instances is known as the ABox and describes the attributes of instances (and individuals), the roles between instances, and other assertions about instances regarding their class membership with the TBox concepts.
Domain ontology
Domain (or content) ontologies embody more of the traditional ontology functions such as information interoperability, inferencing, reasoning and conceptual and knowledge capture of the applicable domain.
Entity
An individual object or member of a class; when affixed with a proper name or label is also known as a named entity (thus, named entities are a subset of all entities).
Entity–attribute–value model
EAV is a data model to describe entities where the number of attributes (properties, parameters) that can be used to describe them is potentially vast, but the number that will actually apply to a given entity is relatively modest. In the EAV data model, each attribute-value pair is a fact describing an entity. EAV systems trade off simplicity in the physical and logical structure of the data for complexity in their metadata, which, among other things, plays the role that database constraints and referential integrity do in standard database designs.
Extensional
The extension of a class, concept, idea, or sign consists of the things to which it applies, in contrast with its intension. For example, the extension of the word “dog” is the set of all (past, present and future) dogs in the world. The extension is most akin to the attributes or characteristics of the instances in a set defining its class membership.
FOAF
FOAF (Friend of a Friend) is an RDF schema for machine-readable modeling of homepage-like profiles and social networks.
Folksonomy
A folksonomy is a user-generated set of open-ended labels called tags organized in some manner and used to categorize and retrieve Web content such as Web pages, photographs, and Web links.
GeoNames
GeoNames integrates geographical data such as names of places in various languages, elevation, population and others from various sources.
GRDDL
GRDDL is a markup format for Gleaning Resource Descriptions from Dialects of Languages; that is, for getting RDF data out of XML and XHTML documents using explicitly associated transformation algorithms, typically represented in XSLT.
High-level Subject
A high-level subject is both a subject proxy and category label used in a hierarchical subject classification scheme (taxonomy). Higher-level subjects are classes for more atomic subjects, with the height of the level representing broader or more aggregate classes.
Individual
See Instance.
Inferencing
Inference is the act or process of deriving logical conclusions from premises known or assumed to be true. The logic within and between statements in an ontology is the basis for inferring new conclusions from it, using software applications known as inference engines or reasoners.
Instance
Instances are the basic, “ground level” components of an ontology. An instance is individual member of a class, also used synonomously with entity. The instances in an ontology may include concrete objects such as people, animals, tables, automobiles, molecules, and planets, as well as abstract instances such as numbers and words. An instance is also known as an individual, with member and entity also used somewhat interchangeably.
Instance record
An instance with one or more attributes also provided.
irON
irON (instance record and Object Notation) is a abstract notation and associated vocabulary for specifying RDF (Resource Description Framework) triples and schema in non-RDF forms. Its purpose is to allow users and tools in non-RDF formats to stage interoperable datasets using RDF.
Intensional
The intension of a class is what is intended as a definition of what characteristics its members should have; it is akin to a definition of a concept and what is intended for a class to contain. It is therefore like the schema aspects (or TBox) in an ontology.
Key-value pair
Also known as a name–value pair or attribute–value pair, a key-value pair is a fundamental, open-ended data representation. All or part of the data model may be expressed as a collection of tuples <attribute name, value> where each element is a key-value pair. The key is the defined attribute and the value may be a reference to another object or a literal string or value. In RDF triple terms, the subject is implied in a key-value pair by nature of the instance record at hand.
Kind
Used synonomously herein with class.
Knowledge base
A knowledge base (abbreviated KB or kb) is a special kind of database for knowledge management. A knowledge base provides a means for information to be collected, organized, shared, searched and utilized. Formally, the combination of a TBox and ABox is a knowledge base.
Linkage
A specification that relates an object or attribute name to its full URI (as required in the RDF language).
Linked data
Linked data is a set of best practices for publishing and deploying instance and class data using the RDF data model, and uses uniform resource identifiers (URIs) to name the data objects. The approach exposes the data for access via the HTTP protocol, while emphasizing data interconnections, interrelationships and context useful to both humans and machine agents.
Mapping
A considered correlation of objects in two different sources to one another, with the relation between the objects defined via a specific property. Linkage is a subset of possible mappings.
Member
Used synonomously herein with instance.
Metadata
Metadata (metacontent) is supplementary data that provides information about one or more aspects of the content at hand such as means of creation, purpose, when created or modified, author or provenance, where located, topic or subject matter, standards used, or other annotation characteristics. It is “data about data”, or the means by which data objects or aggregations can be described. Contrasted to an attribute, which is an individual characteristic intrinsic to a data object or instance, metadata is a description about that data, such as how or when created or by whom.
Metamodeling
Metamodeling is the analysis, construction and development of the frames, rules, constraints, models and theories applicable and useful for modeling a predefined class of problems.
Microdata
Microdata is a proposed specification used to nest semantics within existing content on web pages. Microdata is an attempt to provide a simpler way of annotating HTML elements with machine-readable tags than the similar approaches of using RDFa or microformats.
Microformats
A microformat (sometimes abbreviated μF or uF) is a piece of mark up that allows expression of semantics in an HTML (or XHTML) web page. Programs can extract meaning from a web page that is marked up with one or more microformats.
Natural language processing
NLP is the process of a computer extracting meaningful information from natural language input and/or producing natural language output. NLP is one method for assigning structured data characterizations to text content for use in semantic technologies. (Hand assignment is another method.) Some of the specific NLP techniques and applications relevant to semantic technologies include automatic summarization, coreference resolution, machine translation, named entity recognition (NER), question answering, relationship extraction, topic segmentation and recognition, word segmentation, and word sense disambiguation, among others.
OBIE
Information extraction (IE) is the task of automatically extracting structured information from unstructured and/or semi-structured machine-readable documents. Ontology-based information extraction (OBIE) is the use of an ontology to inform a “tagger” or information extraction program when doing natural language processing. Input ontologies thus become the basis for generating metadata tags when tagging text or documents.
Ontology
An ontology is a data model that represents a set of concepts within a domain and the relationships between those concepts. Loosely defined, ontologies on the Web can have a broad range of formalism, or expressiveness or reasoning power.
Ontology-driven application
Ontology-driven applications (or ODapps) are modular, generic software applications designed to operate in accordance with the specifications contained in one or more ontologies. The relationships and structure of the information driving these applications are based on the standard functions and roles of ontologies (namely as domain ontologies), as supplemented by UI and instruction sets and validations and rules.
Open Semantic Framework
The open semantic framework, or OSF, is a combination of a layered architecture and an open-source, modular software stack. The stack combines many leading third-party software packages with open source semantic technology developments from Structured Dynamics.
Open World Assumption
OWA is a formal logic assumption that the truth-value of a statement is independent of whether or not it is known by any single observer or agent to be true. OWA is used in knowledge representation to codify the informal notion that in general no single agent or observer has complete knowledge, and therefore cannot make the closed world assumption. The OWA limits the kinds of inference and deductions an agent can make to those that follow from statements that are known to the agent to be true. OWA is useful when we represent knowledge within a system as we discover it, and where we cannot guarantee that we have discovered or will discover complete information. In the OWA, statements about knowledge that are not included in or inferred from the knowledge explicitly recorded in the system may be considered unknown, rather than wrong or false. Semantic Web languages such as OWL make the open world assumption. See contrast to the closed world assumption.
OPML
OPML (Outline Processor Markup Language) is an XML format for outlines, and is commonly used to exchange lists of web feeds between web feed aggregators.
OWL
The Web Ontology Language (OWL) is designed for defining and instantiating formal Web ontologies. An OWL ontology may include descriptions of classes, along with their related properties and instances. There are also a variety of OWL dialects.
Predicate
See Property.
Property
Properties are the ways in which classes and instances can be related to one another. Properties are thus a relationship, and are also known as predicates. Properties are used to define an attribute relation for an instance.
Punning
In computer science, punning refers to a programming technique that subverts or circumvents the type system of a programming language, by allowing a value of a certain type to be manipulated as a value of a different type. When used for ontologies, it means to treat a thing as both a class and an instance, with the use depending on context.
RDF
Resource Description Framework (RDF) is a family of World Wide Web Consortium (W3C) specifications originally designed as a metadata model but which has come to be used as a general method of modeling information, through a variety of syntax formats. The RDF metadata model is based upon the idea of making statements about resources in the form of subject-predicate-object expressions, called triples in RDF terminology. The subject denotes the resource, and the predicate denotes traits or aspects of the resource and expresses a relationship between the subject and the object.
RDFa
RDFa 1.0 is a set of extensions to XHTML that is a W3C Recommendation. RDFa uses attributes from meta and link elements, and generalizes them so that they are usable on all elements allowing annotation markup with semantics. A W3C Working draft is presently underway that expands RDFa into version 1.1 with HTML5 and SVG support, among other changes.
RDF Schema
RDFS or RDF Schema is an extensible knowledge representation language, providing basic elements for the description of ontologies, otherwise called RDF vocabularies, intended to structure RDF resources.
Reasoner
A semantic reasoner, reasoning engine, rules engine, or simply a reasoner, is a piece of software able to infer logical consequences from a set of asserted facts or axioms. The notion of a semantic reasoner generalizes that of an inference engine, by providing a richer set of mechanisms.
Reasoning
Reasoning is one of many logical tests using inference rules as commonly specified by means of an ontology language, and often a description language. Many reasoners use first-order predicate logic to perform reasoning; inference commonly proceeds by forward chaining or backward chaining.
Record
As used herein, a shorthand reference to an instance record.
Relation
Used synonomously herein with attribute.
RSS
RSS (an acronym for Really Simple Syndication) is a family of web feed formats used to publish frequently updated digital content, such as blogs, news feeds or podcasts.
schema.org
Schema.org is an initiative launched by the major search engines of Bing, Google and Yahoo!, and later jointed by Yandex, in order to create and support a common set of schemas for structured data markup on web pages. schema.org provided a starter set of schema and extension mechanisms for adding to them. schema.org supports markup in microdata, microformat and RDFa formats.
Semantic enterprise
An organization that uses semantic technologies and the languages and standards of the semantic Web, including RDF, RDFS, OWL, SPARQL and others to integrate existing information assets, using the best practices of linked data and the open world assumption, and targeting knowledge management applications.
Semantic technology
Semantic technologies are a combination of software and semantic specifications that encodes meanings separately from data and content files and separately from application code. This approach enables machines as well as people to understand, share and reason with data and specifications separately. With semantic technologies, adding, changing and implementing new relationships or interconnecting programs in a different way can be as simple as changing the external model that these programs share. New data can also be brought into the system and visualized or worked upon based on the existing schema. Semantic technologies provide an abstraction layer above existing IT technologies that enables bridging and interconnection of data, content, and processes.
Semantic Web
The Semantic Web is a collaborative movement led by the World Wide Web Consortium (W3C) that promotes common formats for data on the World Wide Web. By encouraging the inclusion of semantic content in web pages, the Semantic Web aims at converting the current web of unstructured documents into a “web of data”. It builds on the W3C’s Resource Description Framework (RDF).
Semset
A semset is the use of a series of alternate labels and terms to describe a concept or entity. These alternatives include true synonyms, but may also be more expansive and include jargon, slang, acronyms or alternative terms that usage suggests refers to the same concept.
SIOC
Semantically-Interlinked Online Communities Project (SIOC) is based on RDF and is an ontology defined using RDFS for interconnecting discussion methods such as blogs, forums and mailing lists to each other.
SKOS
SKOS or Simple Knowledge Organisation System is a family of formal languages designed for representation of thesauri, classification schemes, taxonomies, subject-heading systems, or any other type of structured controlled vocabulary; it is built upon RDF and RDFS.
SKSI
Semantic Knowledge Source Integration provides a declarative mapping language and API between external sources of structured knowledge and the Cyc knowledge base.
SPARQL
SPARQL (pronounced “sparkle”) is an RDF query language; its name is a recursive acronym that stands for SPARQL Protocol and RDF Query Language.
Statement
A statement is a “triple” in an ontology, which consists of a subject – predicate – object (S-P-O) assertion. By definition, each statement is a “fact” or axiom within an ontology.
Subject
A subject is always a noun or compound noun and is a reference or definition to a particular object, thing or topic, or groups of such items. Subjects are also often referred to as concepts or topics.
Subject extraction
Subject extraction is an automatic process for retrieving and selecting subject names from existing knowledge bases or data sets. Extraction methods involve parsing and tokenization, and then generally the application of one or more information extraction techniques or algorithms.
Subject proxy
A subject proxy as a canonical name or label for a particular object; other terms or controlled vocabularies may be mapped to this label to assist disambiguation. A subject proxy is always representative of its object but is not the object itself.
Tag
A tag is a keyword or term associated with or assigned to a piece of information (e.g., a picture, article, or video clip), thus describing the item and enabling keyword-based classification of information. Tags are usually chosen informally by either the creator or consumer of the item.
TBox
A TBox (for terminological knowledge, the basis for T in TBox) is a “terminological component”; that is, a conceptualization associated with a set of facts. TBox statements describe a conceptualization, a set of concepts and properties for these concepts. The TBox is sufficient to describe an ontology (best practice often suggests keeping a split between instance records — and ABox — and the TBox schema).
Taxonomy
In the context of knowledge systems, taxonomy is the hierarchical classification of entities of interest of an enterprise, organization or administration, used to classify documents, digital assets and other information. Taxonomies can cover virtually any type of physical or conceptual entities (products, processes, knowledge fields, human groups, etc.) at any level of granularity.
Topic
The topic (or theme) is the part of the proposition that is being talked about (predicated). In topic maps, the topic may represent any concept, from people, countries, and organizations to software modules, individual files, and events. Topics and subjects are closely related.
Topic Map
Topic maps are an ISO standard for the representation and interchange of knowledge. A topic map represents information using topics, associations (similar to a predicate relationship), and occurrences (which represent relationships between topics and information resources relevant to them), quite similar in concept to the RDF triple.
Triple
A basic statement in the RDF language, which is comprised of a subjectproperty – object construct, with the subject and property (and object optionally) referenced by URIs.
Type
Used synonomously herein with class.
UMBEL
UMBEL, short for Upper Mapping and Binding Exchange Layer, is an upper ontology of about 28,000 reference concepts, designed to provide common mapping points for relating different ontologies or schema to one another, and a vocabulary for aiding that ontology mapping, including expressions of likelihood relationships distinct from exact identity or equivalence. This vocabulary is also designed for interoperable domain ontologies.
Upper ontology
An upper ontology (also known as a top-level ontology or foundation ontology) is an ontology that describes very general concepts that are the same across all knowledge domains. An important function of an upper ontology is to support very broad semantic interoperability between a large number of ontologies that are accessible ranking “under” this upper ontology.
Vocabulary
A vocabulary in the sense of knowledge systems or ontologies are controlled vocabularies. They provide a way to organize knowledge for subsequent retrieval. They are used in subject indexing schemes, subject headings, thesauri, taxonomies and other form of knowledge organization systems.
WordNet
WordNet is a lexical database for the English language. It groups English words into sets of synonyms called synsets, provides short, general definitions, and records the various semantic relations between these synonym sets. The purpose is twofold: to produce a combination of dictionary and thesaurus that is more intuitively usable, and to support automatic text analysis and artificial intelligence applications. The database and software tools can be downloaded and used freely. Multiple language versions exist, and WordNet is a frequent reference structure for semantic applications.
YAGO
“Yet another great ontology” is a WordNet structure placed on top of Wikipedia.

[1] This glossary is based on the one provided on the OSF TechWiki. For the latest version, please refer to this link.

Posted by AI3's author, Mike Bergman

Posted on July 9, 2012 at 12:32 pm in Adaptive Information, Linked Data, Ontologies, Semantic Enterprise, Semantic Web | Comments (5)
The URI link reference to this post is: http://www.mkbergman.com/1017/glossary-of-semantic-technology-terms/
The URI to trackback this post is: http://www.mkbergman.com/1017/glossary-of-semantic-technology-terms/trackback/
Page 1 of 1112345...10...Last »
Copyright © 2004–2013 Michael K. Bergman.   This work is licensed under a Creative Commons License