Experiments in understanding and QA of a very large Ontology

Experiments in understanding and QA of a very large Ontology
Prof. Alan Rector

This event took place on 23rd September 2010 at 12:00pm (11:00 GMT)
Knowledge Media Institute, Berrill Building, The Open University, Milton Keynes, United Kingdom, MK7 6AA

SNOMED-CT is a very large (450,000 concept) terminology based on a subset of description logic. Until recently, it was published only in "classified" form in a set of distribution tables. Although everybody knows the hierarchies contain many anomalies, it has been almost impossible to comment on them. Recently they have published the "stated form" and a script for transforming it into OWL. At the same time a group of hospitals has published a list of the most commonly used codes for "problems" - the Core Problem List Subset. Using the module extraction mechanism in the OWL API, and the subset as a signature, a module can be extracted from the stated form which is guaranteed to be sufficient to classify it in the same way as it would be classified in the full SNOMED, but in an ontology of only 35,000 concepts. The new out SNOROCKET (an optimised EL++ classifer) classifies the subset in about 30 seconds making possible iterative exploration and modification.

Using this subset we have begun to develop methods to explore the core subset in combination with two projects. We have begun by taking common key concepts of importance for users and looking up the hierarchies to see how they were classified, then looking for analogies to any problems found. We call the method "analysis by repair". Issues discovered range from simple omissions to gross errors in the ontology schemas for anatomy. Only a few are evident locally without classification.

We have found the Protege Inferred class hierarchy the best screening tool for looking up hierarchies and the OWLViz tool the best definitive tool. Usually, but not always, a complex tangled upwards hierarchy indicates problems. We are just starting to explore the OPPL to find patterns. Performing the task on a large scale requires improved tools.

While this sub-project focuses on an ontology used for terminology, the context is that we wish to use such terminologies as just one small piece of a much larger programme of hybrid ontology based architecture that clearly distinguishes domain ontologies, such as SNOMED, from ontologies describing the use of information from the data structures for that information and that use a variety of reasoning techniques.

(Due to unforeseen circumstances we were unable to record or webcast this event, we apologise to those who were otherwise unable to attend this event in person)

The webcast was open to 100 users

The Open University

Explore

Undergraduate

Postgraduate

Policy