Terminology Sharing White paper

From IHE Wiki
Revision as of 22:49, 11 November 2007 by Christel (talk | contribs) (→‎The Problem)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Proposed Profile: Terminology Sharing White paper

  • Proposal Editor: Christel Daniel (AP-HP, INSERM, Paris)
  • Editor: Christel Daniel (AP-HP, INSERM, Paris), Ana Esterlich (GIP-DMP), François Gareil (Thales), Charles Rica (GIP DMP), Karima Bourquard (GMSIH), Jean Delahousse (Mondeca), Pierre Zweigenbaum (LIMSI, CNRS)
  • Date: N/A (Wiki keeps history)
  • Version: N/A (Wiki keeps history)
  • Domain: <IT infrastructure>

Summary and Scope

To achieve interoperability within and across disparate healthcare IT systems, terminological resources for indexing both narrative and structured clinical information need to be synchronized across the various applications (at a local, or regional, or national, or international level).

Existing standards (e.g Common Terminology Services) address the issue of sharing terminology resources.

This white paper extends the Integration Profile “Sharing Value Sets" to address the following issues:

- Distributing existing Terminology (Code Systems) (how existing terminological resources (Code Systems)are distributed to terminology enabled IT applications). Business scenarios will describe Terminology Services for Narrative documents indexing and mining(how terminological resources (Code Systems)(how Concept codes are distributed to IT applications for efficient use of Natural Processing Language (NLP) algorithms).

- Defining and updating Value Sets ("how mapping between terminological resources (Code Systems) and data model (e.g. CDA r2-based templates or DICOM objects) are defined and revised")

- Defining and distributing Terminology mappings (how mapping between existing terminological resources (Code Systems)are defined, distributed and revised(allowing for example classification and aggregation of data at different level)).

- Terminology correction or authoring ("how Terminology (Code Systems) users submit unambiguous requests for corrections and extensions to Terminology providers (e.g SNOMED CT IHT SDO)and how revisions to content are identified, distributed and integrated into running systems"; how Terminology (Code Systems) users define and distribute local codes in addition to existing Code Systems).

At last, this white paper will address the issue of synchronizing Terminology Servers (e.g the local Terminology Server of an hospital and the national Terminology Server of the personal EHR)

This requires developping terminology services between:

- 1) Terminology Providers (e.g SNOMED CT IHT SDO) and Terminology Administrators in charge of terminology servers : Terminology Providers (using Terminology Creator Sources) distribute terminological resources (Code Systems) to Terminology Administrators in charge of terminology servers(Terminology Repositories/Registries).

- 2) Terminology Administrators of terminology servers and Terminology Users using terminology enabled IT applications : Terminology Administrators of terminology servers(Terminology/Mapping/Value Sets Repositories/Registries) distribute updated inter-related Code Systems and Value Sets to Terminology Users using IT applications (Terminology Consumers).

- 3) Terminology Administrators in charge of different terminology servers (TS) (synchronization between international, national, regional TS and local TS). At the level of a terminology server, Terminology Authors/curators are defining and maintaining relationships between Code Systems and domain content models (Value Sets Source) and the relationships between different terminological resources (Code Systems)(Terminologies mappings Source). Terminology Administrators of terminology servers distribute these Value Sets, Mappings and other extension to existing Code Systems.


Task force:

- HSSP and CTS2 (see Standards and Systems section, below)

- In France many hospitals are deploying new generation of Clinical Information Systems and the government conducts a project of a national scale a PHR (Personal Health Record). The GIP-DMP (french public organization of the PHR), the GMSIH (The Association in Charge of the Modernization of the Healthcare Information Systems), researchers from the INSERM (The National Institute of Health and Medical Research), CNRS (National Center for Scientific Research), and the Association of Hospitals of Paris containing more then 40 hospitals are willing to put efforts into this profile. There is a strong interest from the industry side, namely companies such as Thales, Mondeca, and DBmotion.

The main challenges are proposing an operational solution to share the semantic of EHR (narrative documents, structured documents (based on domain content models such as CDA r2 or DICOM objects, etc) avoiding inconsistent implementations and heterogeneity of technical solutions across domains.

The Problem

1) Semantic interoperability and semantic mining.

Existing de facto clinical information systems contain heterogeneous data, much of them still in unstructured form, such as narrative reports (imaging reports or discharge summaries), notes and correspondence with often a lack a complete view since these data are scattered between different sub-systems within different departments or healthcare provider organizations and not indexed thanks the consistent use of shared terminologies. Systems that have to share clinical data have to share also well defined and unambiguous knowledge of the meaning of these data. Accessing and utilizing shared terminological resources are a common and necessary function for many healthcare IT applications. New generation of Clinical Information Systems (HIS, EHR, personal EHR) have to make use of health informatics knowledge representation contributions such as biomedical ontologies (e.g ontologies from OBO foundry or from BioPortal,SNOMED-CT) and domain content models (such as HL7 RIM based templates or CEN archetypes)[1,2]. Binding domain content models used for structured data entry in CIS and terminological resources does not only support the goal of semantic interoperability. A coherent knowledge view of clinical data is also useful for their connection with medical knowledge (e.g. published medical literature, clinical guidelines and decision support modules). It also allows the use of new semantic mining algorithms. Indeed, using formal ontologies to index clinical information makes it subject to formal, classification-based reasoning.

2) Binding clinical data and terminology resources

a) Terminology Systems Issues concerning the representation of knowledge and the definition of terminology services for patient care and clinical research have been intensively discussed in the Medical Informatics community over the last two decades [3-7]. Existing bio-medical terminology resources (vocabularies, tesaurus, classifications, nomenclatures, ontologies etc) have been often initially created in the context of specific purposes and scopes. By now, new approches aim at providing large scale terminological resources for semantic interoperability such as the Open Biomedical Ontologies (OBO) foundry approach for the basic biomedical sciences [8], BioPortal and clinical SNOMED CT terminology for patient care [9-12]. Formal biomedical ontologies define classes or types of entities, whose properties are expressed by a formal semantics using formalisms such as Description Logics. The entities represented in formal ontologies are abstract types, not language terms. Although, individuals (e.g. the hypertension of patient X) are not represented in the ontology, the main purpose of ontology is to classify individual entities by defining and organizing the semantic types they are instances of [13,14]. Taxonomic reasoning is the most important mechanism of machine inference applied to formal ontologies.

b) Indexing structured data : binding domain content model to terminology system Recent works addresse the issue of defining the relation between clinical information models (HL-7 Version 3 or clinical archetypes)and ontologies (e.g SNOMED CT)[15]. Large scale implementation have been experienced (e.g the UK Connecting for Health project)[16]. A consensus regarding the framework needed to deal with this problem has been reached as one of the outcomes of the EU founded network of excellence Semantic Mining. Universal knowledge about entities of the world is represented within knowledge models (“ontologies of reality” i.e terminology systems), since known facts about concrete clinical cases are represented in information models (ontology of information). Whereas the ontology provides the framework of types/classes along which individual objects can be classified (e.g. hypertension) the information model determines the selection of what is described, its granularity, as well as its context (e.g. how the blood presure is measured, whether the diagnosis of hypertension is suspected or confirmed). “Information models” describe the EHR content whereas the “ontologies of reality” describe real entities with taxonomies and (formal) descriptions. Domain content models (e.g. archetypes, CDA r2-based templates) can have mappings to any or all ontologies of reality.

c) Indexing unstructured data: binding clinical language units to terminology resources Class labels of formal ontologies are "preferred terms" often dedicated to terminologists and not to end users since they are usually not always actual terms usable in dialy pratice. In order to map content of clinical language with classes of a formal ontology, it is necessary to provide a link between the ontology and domain terms. Domain terms can be defined as synonyms of class labels or a separate dictionary can be provided as a separate data structure that is not an integral part of an ontology itself. Each dictionary entry corresponds to a domain term and is linked to one (or to two or more, in the case of polysemous terms) class in the ontology.

3) Terminology server To achieve interoperability within and across disparate healthcare IT systems, terminology resources for indexing both narrative and structured clinical information need to be synchronized across the various applications (at a local, or regional, or national, or international level).

For clinical structured information, terminology binding issues are addressed at the level of Value Sets defined for domain content models (CDA r2 based templates, archetypes, DICOM objects). These Value Sets need be synchronized across the various applications. The Integration Profile “Value Sets Sharing” provides only mechanisms for Value Sets synchronization across IT systems. In addition, the “Terminology Sharing” profile addresses the issues of terminology services dedicated to support Value Sets authoring (the terminology binding process while creating a new domain content model).

For narrative clinical documentation, the “Terminology Sharing” profile will define terminology services supporting the indexation of narrative documents and enhancing search engines capabilities within narrative documents. The “Terminology Sharing” profile will provide solutions for a terminology server to distribute and update across IT systems a whole terminology resource, as complex as SNOMED CT for example (or subsets of a terminology resource).

The “Terminology Sharing” profile also addresses the issue of synchronization of terminology resources between the editor of the resources (e.g SNOMED CT IHT SDO) and terminology servers and between terminology servers at different level(international, national, regional, local).

This requires developping terminology services between:

- 1) Terminology Providers (e.g SNOMED CT IHT SDO) and Terminology Administrators in charge of terminology servers : Terminology Providers (using Terminology Creator Sources) distribute terminological resources (Code Systems) to Terminology Administrators in charge of terminology servers(Terminology Repositories/Registries).

- 2) Terminology Administrators of terminology servers and Terminology Users using terminology enabled IT applications : Terminology Administrators of terminology servers(Terminology/Mapping/Value Sets Repositories/Registries) distribute updated inter-related Code Systems and Value Sets to Terminology Users using IT applications (Terminology Consumers).

- 3) Terminology Administrators in charge of different terminology servers (TS) (synchronization between international, national, regional TS and local TS). At the level of a terminology server, Terminology Authors/curators are defining and maintaining relationships between Code Systems and domain content models (Value Sets Source) and the relationships between different terminological resources (Code Systems)(Terminologies mappings Source). Terminology Administrators of terminology servers distribute these Value Sets, Mappings and other extension to existing Code Systems.


Existing issues:

1) There is a wide variety of Terminology Resources (Code Systems) that have different level of complexity (from simple lists of terms to description logic-based ontologies)

2) Terminology services should be useful for applications providing indexation solutions of structured data but also for applications providing indexation solutions of narrative documents. Such an application may require other resources than terminology resources (lexicon, textual synonyms, etc).

3) Synchronization mechanisms have to be defined between Terminology servers at different level (international, national, regional, local)

Key Use Case

Integrating Terminology Providers and terminology servers

Use case 1.1: Importing a whole terminological resource (Code System). An entire Coding System is sent by the Terminology Source to subscribing systems (Terminology Registry/Repository). These systems must import the Coding System.

Use case 1.2: Importing update of terminological resource(Code System). If a subset of concepts/terms/codes of a Coding System is added, removed or changed, the full Code System is not sent to the Terminology Registry/Repository but only those parts which have changed. Concepts/terms/codes which have been removed from the Code System are not to be used by the receiving system any more; they should not be deleted but be flagged as disabled/invalid for backward compatibility reasons. New added codes may be used from the effective date/time given in the transaction.

Use case 3.x : Authoring terminology mappings

Use case 4.x : Authoring existing terminological resource(Code System) extension

Use case 5.x : Authoring a Value Set While creating a new Value Set, if the Value Set Source needs a new unknown concept/term/code or subset of concepts/terms/codes the Value Set Source queries the Terminology Source (or the Terminology Registry/Repository)for the new code or the new subset.


Integrating terminology servers and terminology enabled IT applications

Use case 1.3: Exporting a whole terminological resource (Code System). An entire Coding System is sent by the terminology server (Terminology Registry/Repository)to subscribing systems (Terminology Consumers). These systems must import the Coding System.

Use case 1.4: Exporting update of terminological resource(Code System). If a subset of concepts/terms/codes of a Coding System is added, removed or changed, the full Code System is not sent to the Terminology Consumer but only those parts which have changed. Concepts/terms/codes which have been removed from the Code System are not to be used by the receiving system any more; they should not be deleted but be flagged as disabled/invalid for backward compatibility reasons. New added codes may be used from the effective date/time given in the transaction.

Use case 2.x : Browsing or querying a terminological resource(Code System), Value Sets or mappings. If a local application needs a new unknown concept/term/code or subset of concepts/terms/codes the Terminology Consumer queries the Terminology Repository/Registry for the new code or the new subset (while designing a questionnaire for structured data entry, while designing a text mining algorithm for narrative document indexation ?). If the concept/term/code or subset of concepts/terms/codes is not available in the Terminology Repository/Registry, the terminology server queries the Terminology Source for the new code or the new subset.

Integrating different terminology servers

Use case 3.x : Authoring and distributing terminology mappings.

Use case 4.x : Authoring and distributing existing terminological resource(Code System) extension.

Use case 5.x : Authoring and distributing a Value Set. While creating a new Value Set, if the Value Set Source needs a new unknown concept/term/code or subset of concepts/terms/codes the Value Set Source queries the Terminology Source (or the Terminology Registry/Repository)for the new code or the new subset.

Standards & Systems

The Health Level Seven (HL7) Version 3 standards are based on a Reference Information Model (RIM). Representation of information within this model is dependent on the availability of terminological resources or “Code Systems” which can be used to populate the properties of the model with appropriate semantic content. Whenever possible, the HL7 Version 3 standard references existing Code Systems instead of attempting to create a new resource within the standard itself.

The Common Terminology Services (CTS 1) specification was developed to provide a set of API calls that represent the core functionality that will be needed by basic HL7 Version 3 applications to manage and access terminologies. It defines the minimum set of functions required for terminology interoperability within the scope of HL7’s messaging and vocabulary browsing requirements [17, 18].

The Healthcare Services Specification Project (HSSP) is a joint endeavor between HL7 (HL7 Service Oriented Architecture group (SOA SIG) of the Electronic Health Record Technical Committee (EHR TC)) and the Object Management Group (OMG Healthcare Domain Task Force (HDTF)[19]. The objectives of the HSSP are to facilitate the development of a set of implementable interface standards supporting agreed-upon services specifications and to stimulate the adoption and use of standardized “plug-and-play” services by healthcare software product vendors. HL7 takes responsibility for identifying functional requirements, information needs, and conformance criteria. These requirements are laid out in a Service Functional Model (SFM) for each HSSP service. The OMG then uses the SFM to develop a “Request for Proposal”. HSSP insures that the HL7 membership remains actively involved throughout the RFP creation and submission evaluation process. HL7 SFMs specify the functional requirements of a service, the OMG RFPs specify the technical requirements of a service (Service Technical Model (STM)).

As part of HSSP, the Common Terminology Services 2 (CTS 2) Specification aims at defining the functional specifications of a set of service interfaces to allow the representation, access, and maintain of terminology content either locally, or across a federation of service nodes. CTS 2 is a reformatting, update and extension to CTS 1, specifically looking to establish a common model for terminology, to rephrase CTS in terms of the Service Functional Model (SFM) and to remove the existing implementation specification to enable re-specification using standardized tools and mappings that have emerged since the original CTS adoption [20). CTS 2 also aims to specify the interactions between terminology providers and consumers, to describe how mapping between compatible terminologies and data models is defined, exchanged and revised and how logic-based terminologies can be queried about subsumption, classification and inferred relationships.

In the IHE Laboratory TF (LAB-TF), the exchange of code sets and associated rules shared by multiple actors is taken care of by a dedicated integration profile called “Laboratory Code Set Distribution” (LCSD) which is based on HL7 V2.X Master Files.

Technical Approach

A similar approach as the ITI-XDS is adopted for the distribution of terminology resources.

White Paper Terminology Sharing.jpg

Existing actors (actors in blue from integration profile "Terminology Value Sets Sharing")

Value Set Source (defines relationships between Value Sets and Code Systems), distributes updated Value Sets to IT applications (Terminology Repositories/Registries) Value Set Repository

Value Set Registry

Value Set Consumer

New actors

Terminology Creator Source (maintains and distributes a terminology resource(Code System))

Terminology/Mapping Source (maintains updated existing terminology resources, relationships between different terminology resources (Code Systems) and distributes inter-related Code Systems to IT applications (Terminology Repositories/Registries and Terminology Consumers).

Terminology/Mapping/Value Sets Repository

Terminology/Mapping/Value Sets Registry

Terminology/Mapping/Value Sets Consumer

Existing transactions (transactions in italic from integration profile "Terminology Value Sets Sharing")

Provide & Register Value Sets

Registry Query Store Value Sets

Retrieve Value Sets

New transactions (standards used)

Provide & Register Terminology Sets and/or Mappings

Update Terminology Sets and/or Mappings

Registry Query Store Terminology Sets and/or Mappings

Retrieve Terminology Sets and/or Mappings

Provide & Register a Reference Terminology (from the editor)

Impact on existing integration profiles

<Indicate how existing profiles might need to be modified.>

New integration profiles needed

<Indicate what new profile(s) might need to be created.>

Breakdown of tasks that need to be accomplished

<A list of tasks would be helpful for the technical committee who will have to estimate the effort required to design, review and implement the profile.>

Support & Resources

In France many hospitals are deploying new generation of Clinical Information Systems and the government conducts a project of a national scale a PHR (Personal Health Record). The GIP-DMP (french public organization of the PHR), the GMSIH (The Association in Charge of the Modernization of the Healthcare Information Systems), researchers from the INSERM (The National Institute of Health and Medical Research), CNRS (National Center for Scientific Research), and the Association of Hospitals of Paris containing more then 40 hospitals are willing to put efforts into this profile. There is a strong interest from the industry side, namely companies such as Thales, Mondeca, and DBmotion.

Risks

<List technical or political risks that will need to be considered to successfully field the profile.>

Open Issues

The “Terminology Sharing” profile could provide solutions for a terminology server to distribute and update across IT systems:

- Mapping between reference terminologies (“officially” distributed) and interface terminologies and/or local terminologies [21,22]

- Domain content models and corresponding Value Sets

The “Terminology Sharing” profile could provide solutions to maintain the consistency between post-coordinated items

Beyong the “Terminology Sharing” profile another profil will be needed to address the issue of distributing and synchronizing domain content models.


References

[1] Kalra D. Electronic Health Record Standards.Methods Inf Med 2006; 45 Suppl 1:107-13.

[2] Schadow G, Mead CN, Walker DM. The HL7 reference information model under scrutiny. Stud Health Technol Inform 2006; 124:151-6.

[3]Cimino J.J. Terminology Tools: State of the Art and Practical Lessons. Meth Inform Med 2001:298-307.

[4]Cimino J.J. Desiderata for controlled medical vocabularies in the twenty-first century. Methods of Information in Medicine, 37(4/5):394–403, 1998.

[5]Cornet R, De Keizer N, Abu-Hanna A. A Framework for Characterizing Terminological Systems. Methods of Information in Medicine, 2006; 45: 253-266.

[6]ISO/TC251 WG3. Standard specification for quality indicators for controlled health vocabularies; 2000 July. Report n° TS 17117.

[7]Rector AL. Clinical terminology: why is it so hard? Methods Inf Med. 1999 Dec;38(4-5):239-52.

[8]The OBO Foundry: http://obofoundry.org/, 2007.

[9]SNOMED Clinical Terms. 2007. International Health Terminology Standards Development Organization (IHTSDO) http://www.ihtsdo.org/.

[10]Rubin DL, Lewis SE, Mungall CJ, Misra S, Westerfield M, Ashburner M, Sim I, Chute CG, Solbrig H, Storey MA, Smith B, Day-Richter J, Noy NF, Musen MA. National Center for Biomedical Ontology: advancing biomedicine through structured organization of scientific knowledge. OMICS. 2006 Summer;10(2):185-98. Review.

[11]Spackman K, Reynoso G. Examining SNOMED from the perspective of formal ontological principles: Some preliminary analysis and observations. 72-80. In: U. Hahn (Ed.): KR-MED 2004, First International Workshop on Formal Biomedical Knowledge Representation, Proceedings of the KR 2004 Workshop on Formal Biomedical Knowledge Representation, Whistler, BC, Canada, 1 June 2004. CEUR Workshop Proceedings 102 CEUR-WS.org 2004

[12] Schulz S, Suntisrivaraporn B, Baader F. SNOMED CT's problem list: ontologists' and logicians' therapy suggestions. Medinfo. 2007;12(Pt 1):802-6.

[13] Schulz S, Stenzhorn H.Ten theses on clinical ontologies. Stud Health Technol Inform. 2007;127:268-75.

[14]Bodenreider O, Smith B, and Burgun A. The ontology-epistemology divide: A case study in medical terminology. In Achille C. Varzi and Laure Vieu, editors, Formal Ontology in Information Systems. Proceedings of the 3rd International Conference - FOIS 2004, pages 185–195. Amsterdam etc.: IOS Press, 2004 .

[15] Rector A, Qamar R, Marley T. Binding Ontologies & Coding Systems to Electronic Health Records and Messages. In: Bodenreider O, editor. Formal Biomedical Knowledge Representation (KR-MED 2006) CEUR; 2006. p. 11-19.

[16]http://www.hl7.org/Special/Committees/terminfo/.

[17]CTS 1. http://informatics.mayo.edu/LexGrid/downloads/CTS/specification/ctsspec/cts.htm

[18]http://informatics.mayo.edu/LexGrid/

[19] HSSP. http://hssp.wikispaces.com/

[20]CTS 2. http://hssp.wikispaces.com/Active+Work+Products

[21] Daniel-Le Bozec C, Steichen O, Dart T, Jaulent M-C.The role of local terminologies in electronic health records. The HEGP experience. Medinfo 2007, vol 12, p.780-84

[22]Rosenbloom ST, Miller RA, Johnson KB. Interface terminologies: facilitating direct entry of clinical data into electronic health record systems. J Am Med Inform Assoc. 2006 May-Jun;13(3):277-88.