Difference between revisions of "Terminology Sharing White paper"

From IHE Wiki
Jump to navigation Jump to search
Line 165: Line 165:
 
- To maintain the consistency between post-coordinated items
 
- To maintain the consistency between post-coordinated items
  
Beyong the “Terminology Sharing” profile another profil will be needed to address the issue of distributing and synchronizing  
+
Beyong the “Terminology Sharing” profile another profil will be needed to address the issue of distributing and synchronizing domain content models.
domain content models.
 
  
  

Revision as of 01:43, 7 November 2007

Proposed Profile: Terminology Sharing White paper

  • Proposal Editor: Christel Daniel (AP-HP, INSERM, Paris)
  • Editor: Christel Daniel (AP-HP, INSERM, Paris), Ana Esterlich (GIP-DMP), François Gareil (Thales), Charles Rica (GIP DMP), Karima Bourquard (GMSIH), Jean Delahousse (Mondeca), Pierre Zweigenbaum (LIMSI, CNRS)
  • Date: N/A (Wiki keeps history)
  • Version: N/A (Wiki keeps history)
  • Domain: <IT infrastructure>

Summary and Scope

To achieve interoperability within and across disparate healthcare IT systems, terminological resources for indexing both narrative and structured clinical information need to be synchronized across the various applications (at a local, or regional, or national, or international level).

Existing standards (e.g Common Terminology Services) address the issue of sharing terminology resources.

This white paper extends the Integration Profile “Sharing Value Sets" to address the following issues:

- Defining and updating Value Sets ("how mapping between terminological resources (Code Systems) and data model (e.g. CDA r2-based templates or DICOM objects) are defined and revised")

- Defining and distributing Terminology mappings (how mapping between existing terminological resources (Code Systems)are defined, distributed and revised(allowing for example classification and aggregation of data at different level)).

- Terminology correction or authoring ("how Terminology (Code Systems) users submit unambiguous requests for corrections and extensions to Terminology providers (e.g SNOMED CT IHT SDO)and how revisions to content are identified, distributed and integrated into running systems"; how Terminology (Code Systems) users define and distribute local codes in addition to existing Code Systems).

The “Terminology Sharing” profile will define business scenarios for Terminology Services for Narrative documents indexing and mining(how terminological resources (Code Systems)are distributed to IT applications for efficient use of Natural Processing Language (NLP) algorithms.


- To synchronize Terminology/Value Sets/Mapping Registries/Repositories across Terminology Servers (e.g the local Terminology Server of an hospital and the national Terminology Server of the personal EHR)

This requires developping terminology services between:

- 1) Terminology Providers (e.g SNOMED CT IHT SDO) and Terminology Administrators in charge of terminology servers : Terminology Providers (using Terminology Creator Sources) distribute terminological resources (Code Systems) to Terminology Administrators in charge of terminology servers(Terminology Repositories/Registries).

- 2) Terminology Administrators of terminology servers and Terminology Users using terminology enabled IT applications : Terminology Administrators of terminology servers(Terminology/Mapping/Value Sets Repositories/Registries) distribute updated inter-related Code Systems and Value Sets to Terminology Users using IT applications (Terminology Consumers).

- 3) Terminology Administrators in charge of different terminology servers (TS) (synchronization between international, national, regional TS and local TS). At the level of a terminology server, Terminology Authors/curators are defining and maintaining relationships between Code Systems and domain content models (Value Sets Source) and the relationships between different terminological resources (Code Systems)(Terminologies mappings Source). Terminology Administrators of terminology servers distribute these Value Sets, Mappings and other extension to existing Code Systems.


Task force:

- HSSP and CTS2 (see Standards and Systems section, below)

- In France many hospitals are deploying new generation of Clinical Information Systems and the government conducts a project of a national scale a PHR (Personal Health Record). The GIP-DMP (french public organization of the PHR), the GMSIH (The Association in Charge of the Modernization of the Healthcare Information Systems), researchers from the INSERM (The National Institute of Health and Medical Research), CNRS (National Center for Scientific Research), and the Association of Hospitals of Paris containing more then 40 hospitals are willing to put efforts into this profile. There is a strong interest from the industry side, namely companies such as Thales, Mondeca, and DBmotion.

The main challenges are proposing an operational solution to share the semantic of EHR (narrative documents, structured documents (based on domain content models such as CDA r2 or DICOM objects, etc) avoiding inconsistent implementations and heterogeneity of technical solutions across domains.

The Problem

1) Semantic interoperability and semantic mining.

Existing de facto clinical information systems contain heterogeneous data, much of them still in unstructured form, such as narrative reports (imaging reports or discharge summaries), notes and correspondence with often a lack a complete view since these data are scattered between different sub-systems within different departments or healthcare provider organizations and not indexed thanks the consistent use of shared terminologies. Systems that have to share clinical data have to share also well defined and unambiguous knowledge of the meaning of these data. Accessing and utilizing shared terminological resources are a common and necessary function for many healthcare IT applications. New generation of Clinical Information Systems (HIS, EHR, personal EHR) have to make use of health informatics knowledge representation contributions such as biomedical ontologies (e.g ontologies from OBO foundry, SNOMED-CT) and domain content models (such as HL7 RIM based templates or CEN archetypes)[1,2]. Binding domain content models used for structured data entry in CIS and terminological resources does not only support the goal of semantic interoperability. A coherent knowledge view of clinical data is also useful for their connection with medical knowledge (e.g. published medical literature, clinical guidelines and decision support modules). It also allows the use of new semantic mining algorithms. Indeed, using formal ontologies to index clinical information makes it subject to formal, classification-based reasoning.

2) Binding clinical data and terminology resources

a) Terminology Systems Issues concerning the representation of knowledge and the definition of terminology services for patient care and clinical research have been intensively discussed in the Medical Informatics community over the last two decades [3-7]. Existing bio-medical terminology resources (vocabularies, tesaurus, classifications, nomenclatures, ontologies etc) have been often initially created in the context of specific purposes and scopes. By now, new approches aim at providing large scale terminological resources for semantic interoperability such as the Open Biomedical Ontologies (OBO) foundry approach for the basic biomedical sciences [8] and clinical SNOMED CT terminology for patient care [9-11]. Formal biomedical ontologies define classes or types of entities, whose properties are expressed by a formal semantics using formalisms such as Description Logics [12]. The entities represented in formal ontologies are abstract types, not language terms. Although, individuals (e.g. the hypertension of patient X) are not represented in the ontology, the main purpose of ontology is to classify individual entities by defining and organizing the semantic types they are instances of [13,14]. Taxonomic reasoning is the most important mechanism of machine inference applied to formal ontologies.

b) Indexing structured data : binding domain content model to terminology system Recent works addresse the issue of defining the relation between clinical information models (HL-7 Version 3 or clinical archetypes)and ontologies (e.g SNOMED CT)[15]. Large scale implementation have been experienced (e.g the UK Connecting for Health project)[16]. A consensus regarding the framework needed to deal with this problem has been reached as one of the outcomes of the EU founded network of excellence Semantic Mining. Universal knowledge about entities of the world is represented within knowledge models (“ontologies of reality” i.e terminology systems), since known facts about concrete clinical cases are represented in information models (ontology of information). Whereas the ontology provides the framework of types/classes along which individual objects can be classified (e.g. hypertension) the information model determines the selection of what is described, its granularity, as well as its context (e.g. how the blood presure is measured, whether the diagnosis of hypertension is suspected or confirmed). “Information models” describe the EHR content whereas the “ontologies of reality” describe real entities with taxonomies and (formal) descriptions. Domain content models (e.g. archetypes, CDA r2-based templates) can have mappings to any or all ontologies of reality.

c) Indexing unstructured data: binding clinical language units to terminology resources Class labels of formal ontologies are "preferred terms" often dedicated to terminologists and not to end users since they are usually not always actual terms usable in dialy pratice. In order to map content of clinical language with classes of a formal ontology, it is necessary to provide a link between the ontology and domain terms. Domain terms can be defined as synonyms of class labels or a separate dictionary can be provided as a separate data structure that is not an integral part of an ontology itself. Each dictionary entry corresponds to a domain term and is linked to one (or to two or more, in the case of polysemous terms) class in the ontology.

3) Terminology server To achieve interoperability within and across disparate healthcare IT systems, terminology resources for indexing both narrative and structured clinical information need to be synchronized across the various applications (at a local, or regional, or national, or international level).

For clinical structured information, terminology binding issues are addressed at the level of Value Sets defined for domain content models (CDA r2 based templates, archetypes, DICOM objects). These Value Sets need be synchronized across the various applications. The Integration Profile “Value Sets Sharing” provides only mechanisms for Value Sets synchronization across IT systems. In addition, the “Terminology Sharing” profile addresses the issues of 1) terminology services dedicated to support the terminology binding process while creating a new domain content model and 2) distributing and synchronizing updated domain content models.

For narrative clinical documentation, the “Terminology Sharing” profile will define terminology services supporting the indexation of narrative documents and enhancing search engines capabilities within narrative documents. The “Terminology Sharing” profile will provide solutions for a terminology server to distribute and update across IT systems a whole terminology resource, as complex as SNOMED CT for example (or subsets of a terminology resource).

The “Terminology Sharing” profile also addresses the issue of synchronization of terminology resources between the editor of the resources (e.g SNOMED CT IHT SDO) and terminology servers.

This requires that 1) terminology editors (Terminology Creator Sources) distribute terminology resources (Code Systems) to terminology servers (Terminology Sources), 2) terminology servers (Terminology Sources) maintain relationships between different terminology resources (Code Systems) and between Code Systems and Value Sets of different domain content models and also distribute updated inter-related Code Systems and Value Sets to IT applications (Terminology Repositories/Registries, Terminology Consumers)

Existing issues:

1) There is a wide variety of Terminology Resources (Code Systems) that have different level of complexity (from simple lists of terms to description logic-based ontologies)

2) Terminology services should be useful for applications providing indexation solutions of structured data but also for applications providing indexation solutions of narrative documents. Such an application may require other resources than terminology resources (lexicon, textual synonyms, etc). These are out of scope for this proposal.

Key Use Case

Use case 1.1: Importing a whole terminological resource (Code System). An entire Coding System is sent by the Terminology Source to subscribing systems (Terminology Registry/Repository). These systems must import the Coding System.

Use case 1.2: Importing update of terminological resource(Code System). If a subset of concepts/terms/codes of a Coding System is added, removed or changed, the full Code System is not sent to the Terminology Registry/Repository but only those parts which have changed. Concepts/terms/codes which have been removed from the Code System are not to be used by the receiving system any more; they should not be deleted but be flagged as disabled/invalid for backward compatibility reasons. New added codes may be used from the effective date/time given in the transaction.

Use case 1.3: Exporting a whole terminological resource (Code System). An entire Coding System is sent by the terminology server (Terminology Registry/Repository)to subscribing systems (Terminology Consumers). These systems must import the Coding System.

Use case 1.4: Exporting update of terminological resource(Code System). If a subset of concepts/terms/codes of a Coding System is added, removed or changed, the full Code System is not sent to the Terminology Consumer but only those parts which have changed. Concepts/terms/codes which have been removed from the Code System are not to be used by the receiving system any more; they should not be deleted but be flagged as disabled/invalid for backward compatibility reasons. New added codes may be used from the effective date/time given in the transaction.

Use case 2.1: Authoring a Value Set. While creating a new Value Set, if the Value Set Source needs a new unknown concept/term/code or subset of concepts/terms/codes the Value Set Source queries the Terminology Source (or the Terminology Registry/Repository)for the new code or the new subset.

Use case 2.2: Updating a terminology set. If a local application needs a new unknown concept/term/code or subset of concepts/terms/codes the Terminology Consumer queries the Terminology Repository/Registry for the new code or the new subset (while designing a questionnaire for structured data entry, while designing a text mining algorithm for narrative document indexation ?). If the concept/term/code or subset of concepts/terms/codes is not available in the Terminology Repository/Registry, the terminology server queries the Terminology Source for the new code or the new subset.

Standards & Systems

The CTS defines the minimum set of functions required for terminology interoperability within the scope of HL7’s messaging and vocabulary browsing requirements. The LexGrid model has been selected by HL7 as the vocabulary model in which the HL7 vocabulary will be represented. In the IHE Laboratory TF (LAB-TF), the exchange of code sets and associated rules shared by multiple actors is taken care of by a dedicated integration profile called “Laboratory Code Set Distribution” (LCSD) which is based on HL7 V2.X Master Files.

Technical Approach

A similar approach as the ITI-XDS is adopted for the distribution of terminology resources.

White Paper Terminology Sharing.jpg

Existing actors (actors in blue from integration profile "Terminology Value Sets Sharing")

Value Set Source (defines relationships between Value Sets and Code Systems), distributes updated Value Sets to IT applications (Terminology Repositories/Registries) Value Set Repository

Value Set Registry

Value Set Consumer

New actors

Terminology Creator Source (maintains and distributes a terminology resource(Code System))

Terminology/Mapping Source (maintains updated existing terminology resources, relationships between different terminology resources (Code Systems) and distributes inter-related Code Systems to IT applications (Terminology Repositories/Registries and Terminology Consumers).

Terminology/Mapping/Value Sets Repository

Terminology/Mapping/Value Sets Registry

Terminology/Mapping/Value Sets Consumer

Existing transactions (transactions in italic from integration profile "Terminology Value Sets Sharing")

Provide & Register Value Sets

Registry Query Store Value Sets

Retrieve Value Sets

New transactions (standards used)

Provide & Register Terminology Sets and/or Mappings

Update Terminology Sets and/or Mappings

Registry Query Store Terminology Sets and/or Mappings

Retrieve Terminology Sets and/or Mappings

Provide & Register a Reference Terminology (from the editor)

Impact on existing integration profiles

<Indicate how existing profiles might need to be modified.>

New integration profiles needed

<Indicate what new profile(s) might need to be created.>

Breakdown of tasks that need to be accomplished

<A list of tasks would be helpful for the technical committee who will have to estimate the effort required to design, review and implement the profile.>

Support & Resources

In France many hospitals are deploying new generation of Clinical Information Systems and the government conducts a project of a national scale a PHR (Personal Health Record). The GIP-DMP (french public organization of the PHR), the GMSIH (The Association in Charge of the Modernization of the Healthcare Information Systems), researchers from the INSERM (The National Institute of Health and Medical Research), CNRS (National Center for Scientific Research), and the Association of Hospitals of Paris containing more then 40 hospitals are willing to put efforts into this profile. There is a strong interest from the industry side, namely companies such as Thales, Mondeca, and DBmotion.

Risks

<List technical or political risks that will need to be considered to successfully field the profile.>

Open Issues

The “Terminology Sharing” profile could provide solutions for a terminology server to distribute and update across IT systems: - Mapping between reference terminologies (“officially” distributed) and interface terminologies and/or local terminologies [17,18]. - Domain content models and corresponding Value Sets

The “Terminology Sharing” profile could provide solutions - To maintain the consistency between post-coordinated items

Beyong the “Terminology Sharing” profile another profil will be needed to address the issue of distributing and synchronizing domain content models.


References

[1] Kalra D. Electronic Health Record Standards.Methods Inf Med 2006; 45 Suppl 1:107-13.

[2] Schadow G, Mead CN, Walker DM. The HL7 reference information model under scrutiny. Stud Health Technol Inform 2006; 124:151-6.

[3]Cimino J.J. Terminology Tools: State of the Art and Practical Lessons. Meth Inform Med 2001:298-307.

[4]J. J. Cimino. Desiderata for controlled medical vocabularies in the twenty-first century. Methods of Information in Medicine, 37(4/5):394–403, 1998.

[5]R. Cornet R, N. F. De Keizer, A. Abu-Hanna. A Framework for Characterizing Terminological Systems. Methods of Information in Medicine, 2006; 45: 253-266.

[6]ISO/TC251 WG3. Standard specification for quality indicators for controlled health vocabularies; 2000 July. Report n° TS 17117.

[7]Rector AL. Clinical terminology: why is it so hard? Methods Inf Med. 1999 Dec;38(4-5):239-52.

[8]The OBO Foundry: http://obofoundry.org/, 2007.

[9]SNOMED Clinical Terms. 2007. International Health Terminology Standards Development Organization (IHTSDO) http://www.ihtsdo.org/.

[10]K. Spackman, G. Reynoso: Examining SNOMED from the perspective of formal ontological principles: Some preliminary analysis and observations. 72-80. In: U. Hahn (Ed.): KR-MED 2004, First International Workshop on Formal Biomedical Knowledge Representation, Proceedings of the KR 2004 Workshop on Formal Biomedical Knowledge Representation, Whistler, BC, Canada, 1 June 2004. CEUR Workshop Proceedings 102 CEUR-WS.org 2004

[11] Schulz S, Suntisrivaraporn B, Baader F. SNOMED CT's problem list: ontologists' and logicians' therapy suggestions. Medinfo. 2007;12(Pt 1):802-6.

[12]I. Horrocks, P. F. Patel-Schneider and F. van Harmelen, “From SHIQ and RDF to OWL: The making of a Web Ontology Language”, Journal of Web Semantics, 1(1), 7–26, 2003.

[13] Schulz S, Stenzhorn H.Ten theses on clinical ontologies. Stud Health Technol Inform. 2007;127:268-75.

[14]O: Bodenreider, B: Smith, and A: Burgun. The ontology-epistemology divide: A case study in medical terminology. In Achille C. Varzi and Laure Vieu, editors, Formal Ontology in Information Systems. Proceedings of the 3rd International Conference - FOIS 2004, pages 185–195. Amsterdam etc.: IOS Press, 2004 .

[15]A. Rector, R. Qamar, T. Marley. Binding Ontologies & Coding Systems to Electronic Health Records and Messages. In: Bodenreider O, editor. Formal Biomedical Knowledge Representation (KR-MED 2006) CEUR; 2006. p. 11-19.

[16]http://www.hl7.org/Special/Committees/terminfo/].

[17]C. Daniel-Le Bozec, O.Steichen, T. Dart, M-C Jaulent.The role of local terminologies in electronic health records. The HEGP experience. Medinfo 2007, vol 12, p.780-84

[18]Rosenbloom ST, Miller RA, Johnson KB. Interface terminologies: facilitating direct entry of clinical data into electronic health record systems. J Am Med Inform Assoc. 2006 May-Jun;13(3):277-88.