Sharing Value Sets
- 1 Proposed Profile: Sharing Value Sets in a Vocabulary Domain
- 2 The Problem
- 3 3. Key Use Case
- 4 4. Standards & Systems
- 5 5. Technical Approach
- 6 6. Support & Resources
- 7 7. Risks
- 8 8. Open Issues
- 9 9. Tech Cmte Evaluation
Proposed Profile: Sharing Value Sets in a Vocabulary Domain
- Proposal Editor: Christel Daniel (AP-HP, INSERM, Paris), Karima Bourquard (GMSIH), François Gareil (Thales), Jean Delahousse (Mondeca), Norbert Lipszyc (DBmotion), Pierre Zweigenbaum (LIMSI, CNRS), Ana Esterlich (GIP-DMP), Charles Rica (GIP-DMP)
- Profile Editor: Ana Estelrich
- Date: N/A (Wiki keeps history)
- Version: N/A (Wiki keeps history)
- Domain: ITI
Federal healthcare facilities, RHIOs, and national EHRs need to find a way to effectively share their health information, adopting the same clinical vocabulary. The vocabulary used to capture patient data is not uniform, resulting into an erroneous data capture and a lack of semantic interoperability (1). The problem can be isolated in three main cases namely, the adoption of a new nomenclature by a newly installed system, or the turnover of a legacy nomenclature to a new one, the update of an already existing value set, and the creation of a new value set from a new terminology.
The HL7 v3 Reference Information Model (RIM) version 2.14n, and the terminology models are interdependent. The HL7 v3 Data Types describe the structure and properties of the data types pertaining to the Value Set. The HL7 v3 RIM, Data Type definitions and the HL7 Vocabulary can be parts of the standard to use.
The HL7 Common Terminology Services (HL7 CTS) version 1.2 - November 2004 (2) focuses on the common functionalities that an external terminology resource must be able to provide. HL7 CTS describes a set of Application Programming Interfaces (API) (or a source code interface) that can be used by HL7 v3 software, when accessing terminological content. The message elements as well as the message runtime and browsing API are well supported by this standard.
An end-user clinical application such as a Content Creator/Consumer Actor will need a Value Set Consumer Actor in order to create or consume structured, coded content such as CDA r2 based documents or DICOM objects. This Value Set will contain values derived from one or more code systems and it needs to be up to date so that different Content Creator/Consumer systems can interoperate. This profile will enable it to have access to the most recent ValueSet that has been published by the standardization bodies via a Terminology Source Actor. In cases of a brand new installation, the application would be able to download the most recent version of the Value Set, and then mapping it to its internal codes, or creating a complete new internal nomenclature. The internal mapping will have no impact on the interoperability of the whole system to which the application is connected since we are always sure to use the most updated, official terminology.
The interest in this issue is quite considerable from a government, healthcare facilities, and a vendor perspective.
The United States Department of Health and Human Services has created in 2004 the Office of the National Coordinator for Health Information Technology (ONC) as a response to the presidential call to widespread deployment of health information technology (3). In order to accomplish this task, agencies need to adopt the same clinical vocabularies. The Consolidated Health Informatics (CHI) initiative will establish a portfolio of existing clinical vocabularies in order to achieve semantic interoperability. The Centre for Disease Control and Prevention Public Health Information Network Vocabulary Access and Distribution System provides a web-based vocabulary server for browsing, searching, and downloading PHIN vocabularies using value sets, value set concepts, value sets OIDs, or even code systems, code system concepts, or code system OIDs (4).
The Mayo Clinic is also using the The Lexical Grid, a distributed network of Shared Terminology Resources (5).
The Clinical Terminology Integration (CTI) standards project evaluates and documents available standard terminologies for use in the pan-Canadian EHR (6).
These initiatives are advancing at moderate pace in a federated environment, following different regulations.
France is in the midst of installing at a national scale a PHR (Personal Health Record) and it willing to participate with national efforts in the Profile development. Researchers from the INSERM (The National Institute of Health and Medical Research), CNRS (National Center for Scientific Research), the GMSIH (The Association in Charge of the Modernization of the Healthcare Information Systems), and the Association of Hospitals of Paris containing more then 40 hospitals are willing to put efforts into this profile. There is a strong interest from the industry side, namely companies such as Thales, Mondeca, and DBmotion.
IHE is the perfect venue to solve this problem because all the aforementioned efforts in achieving interoperability are aiming or already using the IHE-ITI-XDS infrastructure. More so, the IHE PCC content profiles use Clinical Document Architecture (CDA r2) as an established standard for the exchange for clinical documents which specifies the structure and semantics of clinical documents (7) and also the profile XDS-I metadata needs a common Value Set (for example body parts). Since IHE-XDS is content-neutral, the profiles concerned are the content profiles. The need to have a common national terminology is of paramount importance when functional and semantic interoperability is at stake.
Today’s terminologies are becoming more and more complex. Encoding is necessary to enable automated processing and not just human interpretation of ideas and concepts in the context of structured documents, namely the content profiles using the HL7 Clinical Document Architecture or DICOM objects. Some of the benefits of encoded information are:
• The organization of information mean for human interpretation (classification of document types and section headings, enable data filtering and exploitation, easier navigation to related information)
• Effective indexing and retrieval of information (specific types of records or data)
• Automated translation to a different human language for human presentation (6).
To focus on the end user requirements, and not just the solution mechanism, and to give people trying to understand the applications concrete examples of the problems existing and the nature of the solution required. State the problem domain and outline the workflows in terms of the people, tasks, systems and information involved. Feel free to describe both the current “problematic” workflow as well as a desirable future workflow where appropriate. Remember that other committee members reviewing the proposal may or may not have a detailed familiarity with this problem. Where appropriate, define terms.>
Most healthcare facilities use textual information or if they use encoding, they use their internal codes and not an official terminology. Distributing and implementing an official terminology is a challenging task. This would have to be done when a new system is installed or when a system decides to change completely its nomenclature. Charging a terminology off a disk can be a time-consuming action, not to mention it will have to be repeated each time an updated version becomes available.
Certain concepts in a Value Set used clinically will change, become obsolete, or there will be new ones added. Most of the time the charge technologist is looking on the internet or calling up the vendor of the system, or their colleagues to find out if a new version has become available and where they can get it from. If the ValueSet is not obtained quickly enough and the changes are not enormous, they are usually entered by hand, leading to potential data entry errors. A method of synchronization with the official terminologies (updating) would facilitate the workload involved in such tasks.
Keeping an up-to-date terminology is important for the sake of interoperability. If an institution is using a different version of values then the one whom the document is sent to, potential medical errors might result. To close the loop, as soon as a new terminology is uploaded or updated, an internal mapping should be between the data elements that the clinical application is using and the data definition used in the HL7 specifications since it will ensure user compliance and ease of use within the coding process.
3. Key Use Case
Use-Case 1 : Importing a whole ValueSet
An application has been just installed, so it has no ValueSets. It will need to retrieve the whole data set. The ValueSet Consumer queries the ValueSet Registry. The ValueSet Registry will indicate where the new values are in the ValueSet Repository, as well as the metadata belonging to this Value Set (name, OID, Assigning Auth. Version). The Value Set Consumer will retrieve the new Value Set and integrate it somehow into the application (Content Creator/Consumer).
The metadata of the value set are stored with the ValueSet Consumer and associated to the ValueSet for further reference and update.
Use-case 2 : Updating from a ValueSet
An application contains already a ValueSet, so it has no ValueSets. It receives notification of an update or it does a query to see if a new version is available (we have to see which of these two are more efficient and technically feasible). The ValueSet Consumer queries the ValueSet Registry with regards to the Value Set in questions. The parameters are its name, OID, Assigning Auth. Version. The ValueSet Registry will indicate where the new values are in the ValueSet source, as well as the metadata belonging to this Value Set (name, OID, Assigning Auth. Version). The Document Consumer will retrieve the new additions or the new inactive codes to the Value Set
The metadata of the value set are stored with the ValueSet Consumer and associated to the ValueSet for further reference and update.
Use-case 3: Creating a new ValueSet
Prior to creating a new Value Set, the ValueSet Source queries the Terminology Registry in order to obtain the metadata attached to the terminology (SNOMED, LOINC, etc.) from which the new Value Set is supposed to be created from. The terminology in question is then retrieved from the Terminology Repository and a Value Set is thus created, with the right terminology references.
The ValueSet Source makes a [Provide & Register VS] transaction. The new Value Set is then stored into the ValueSet Repository, and registered as a new entry with its metadata in the Value Set Registry.
Fig. 2: Creating a brand new value set
Use-case 4: Updating a ValueSet
The ValueSet Source makes a [Provide & Register Updated VS] transaction. The updated Value Set is then stored into the ValueSet Repository, and registered its metadata and the new version number in the Value Set Registry.
Fig. 3: Updating a value set
4. Standards & Systems
The HL7 v3 Reference Information Model (RIM) version 2.14n, and the terminology models are interdependent. The HL7 v3 Data Types describe the structure and properties of the data types pertaining to the Value Set. The HL7 v3 RIM, Data Type definitions and the HL7 Vocabulary are all good parts of the standard to use. The HL7 CTS version 1.2 - November 2004 (2) specifies the common functional characteristics that an external terminology must be able to provide and defines an Application Programming Interface (API) that can be used by HL7 version 3, software when accessing terminological content. The standard states that are two layers between the HL7 message processing applications and the target vocabularies. The standard can be downloaded on the site: http://informatics.mayo.edu/LexGrid/downloads/CTS/specification/ctsspec/cts.htm The upper layer, the Message API, communicates with the messaging software, and it does so in terms of vocabulary domains, contexts, value sets, coded attributes, and other artifacts of the HL7 message model. The lower layer, the Vocabulary API, communicates with the terminology service software, and does so in terms of code systems, concept codes, designations, relationships and other terminology specific entities. The message API is specific to HL7. It allows to a wide variety of message processing applications to create, validate and translate CD-derived data types in a consistent and reproducible fashion. The Vocabulary API intends to be generic. It allows applications to query different terminologies in a consistent, well-defined fashion. The Message API uses the Vocabulary API. A list of valid concept codes is referred to as a value set. The key terms regarding this proposal are: • Common CTS Message Elements • Service Identification Section – common to both message runtime and browsing API • CTS Message Browsing API (such as looking up a vocabulary domain and looking up a value set).
Vocabulary Domain is an abstract conceptual space such as "countries of the world", "the gender of a person used for administrative purposes". Each Vocabulary Domain has a unique name along with a description of the conceptual space that it represents. Before the values of an attribute can be used from this conceptual space, an actual list of concept codes needs actually to be defined. A list of valid concept codes that are logically related is referred to as a value set. A vocabulary domain must be represented by at least one value set. A value set may include a list of zero or more CodedConcepts drawn from a single CodeSystem. A ValueSet can represent: • All of the CodedConcepts defined in exactly one CodeSystem • A specified list of CodedConcepts that are defined in exactly one CodeSystem • The set of CodedConcepts represented by another ValueSet.
In other words, a value set is ‘a collection of concepts drawn from one or more vocabulary code systems and grouped together for a specific purpose.’ (e.g: "Microorganism" value set derived from SNOMED-CT code system.) (8).
A value set also has Value Set Concepts, which is the name for an object or abstract idea that provides a pointer to the code system concept code and/or name. (e.g "Bacillus Anthracis" is a concept in the "Microorganism" value set derived from SNOMED-CT code system.) A value set will also have an OID (Value Set OID - Unique Object Identifier for a Value Set). The metadata and the associations of a value set are presented in the table 1:
A representation of a value set can be: Value Set Name: Infectious Agent (Microorganism)
Value Set Code: PHVS_InfectiousAgent_CDC
Value Set OID: 2.16.840.1.114126.96.36.1998
Code System Name: SNOMED-CT
Code System Code: PH_SNOMED-CT
Code System OID: 2.16.840.1.113883.6.96
(source: Public Health Information Network)(4).
Since XDS.b is using Web Services, there might be a suggestion to be revised by the technical committee of using Web Services APIs, such as Java API for XML-based RPC (JAX-RPC) 1.1 which is an API for building and deploying SOAP+WSDL web services clients and endpoints.
Also Java APIs for XML Registries (JAXR) 1.0.4 can be used in accessing different kinds of XML registries. It provides you with a single set of APIs to access a variety of XML registries, including UDDI and the ebXML Registry without having to know the registry's information model (9).
5. Technical Approach
A similar approach as the ITI-XDS is adopted for the distribution of the terminologies, with focus on the ValueSets used in a common clinical setting.
The HL7 CTS message elements, the metadata of the value set and the association that it does make with the other components of the Vocabulary Domain must be investigated. Also the Service Identification Section and the Message Browsing API must be looked at so that we can see how it will exactly affect the transactions between the proposed authors. Also since XDS.b is using Web Services, it might be of interest to examine the use of Web Services API.
Ultimately the aim would be to treat a whole code system, as complex as SNOMED, for example, and covering vocabulary domains, contexts, and relations between terminologies such as interface terminologies, including the “processing” terminologies for data mining - Natural Language Processing technologies. These will be treated separately in a White Paper.
For sake of completeness, all actors will be shown, with the understanding that the main transactions are concerning the Value Sets.
The actor who is the source of the terminology resource, and who receives code system like SNOMED or LOINC. This actor is mentioned only for the sake of completeness and it will be further described in the White Paper.
An actor which stores terminologies received from the Terminology Source. It has the responsibility to register the metadata of the terminology with the Terminology Registry.
An actor that keeps track of the terminology that the Terminology Source actor receives, including the new ones, as well as the updates.
An actor whose role is to edit "official" ValueSets, and maintain a link between the ValueSets and the Terminology Source via the Terminology Repository actor (its code systems). It creates a brand new Value Set in the Value Set Repository based on the information it recuperated from the Terminology Repository.
Actor whose role is to store the brand new ValueSets that the ValueSet Source has sent and also of its different updates. It also has the responsibility to register the metadata of each new or updated ValueSet it receives from the TerminologySource
An actor who keeps track of the metadata belonging to the ValueSets existing in the Value Set Repository. The metadata registered can be queried, namely on the: name, the OID and the Assigning Auth Version. Each new entry or an update in the ValueSet Repository will create an entry in the ValueSet Registry.
An actor who queries the ValueSet Registry and who consequently retrieves it from the ValueSet Repository. The ValueSet Consumer queries the ValueSet Registry, namely on the: name, the OID and the Assigning Auth Version so that it can update the latest version if needed. The ValueSet Consumer will somehow interact (maybe even be) the Content Creator/Consumer so that the later one can use the Value Sets required for encoding. This point of interaction still has to be figured out.
The actors would need to authenticate one to another and also be respectful of the consistent time profile (coupled with ATNA and CT transactions).
New transactions (standards used)
ValueSet = VS:
• [Query VS Registry] (between ValueSet Consumer/ValueSet Source and Value Set Registry) List of the Value Sets available for synchronization with the various versions
• [Retrieve VS] (between Value Set Consumer and Value Set Repository) Acquisition of a terminology ValueSet in full mode for a given version (use case importing a whole Value Set)
• [Retrieve VS] (between Value Set Consumer and Value Set Repository) Acquisition of a Value Set in update mode from a version up to another version for example version 1.3 in 1.5 (use case Updating a Value Set)
• [Provide & Register VS] (between Value Set Source and Value Set Repository) Provide Value Set for in update mode for a version
• [Provide & Register VS] (between Value Set Source and Value Set Repository) Provide Value Set (brand new value set)
• [Register VS] (between Value Set Repository and Value Set Registry) Register Value Set for version update
• [Register VS] (between Value Set Repository and Value Set Registry) Register Value Set (brand new value set)
• [Notification VS] (optional) Registry ValueSet notifies VS Consumer that an update of the VS is available.
The transaction between the authors can be seen in the Figure 4 and 5, below:
Fig. 4: Creating a new Value Set from a new terminology and consuming it by an application
Fig. 5: Updating a Value Set from to a new version consuming it by an application
Impact on existing integration profiles
Needs to be paired with profiles CT and ATNA. Will improve the efficiency of coding in the content profiles.
New integration profiles needed
ITI-ST (ITI Sharing Terminologies)
Breakdown of tasks that need to be accomplished
• Detailed description of the current situation in a clinical setting with regards to terminology use.
• Detailed description of what this profile will bring
• Brief description of the whole terminology model
• Characteristics of encoded data
• Implementer’s responsibilities
• Figure out the feasibility of the transactions
• Detailed description of the ValueSet and its characteristics – metadata, content, structure
• Determine which metadata is necessary for registering the Value Set and which metadata is needed for the update.
• Handling Value Set Changes
• Description of the use of obsolete values in the Value Set and how should the application handle them
• Detail the connection between the ValueSet Consumer and the Content Creator/Consumer
• What terminology services functions does this profile offer – name exchange, identifier translation, local information, track version changes (10)
• How will the mapping be done?
• Conformance testing on the Value Set once retrieved
• Create table of dependencies on other profiles
• Follow Context
• Trigger Events
• Message Semantics
• Expected Actions
• Security considerations
• What issues will be addressed not here but in the white paper?
6. Support & Resources
France is in the midst of installing at a national scale a PHR (Personal Health Record) and it willing to participate with national efforts in the Profile development. Researchers from the INSERM (The National Institute of Health and Medical Research), CNRS (National Center for Scientific Research), the GMSIH (A Public Interest Group in charge of the Modernization of the Healthcare Information Systems), and the Association of Hospitals of Paris containing more then 40 hospitals are willing to put efforts into this profile. Not to forget, the industry is also interested in participating, namely companies such as Thales, Mondeca, and DBmotion.
To be discussed.
8. Open Issues
• What is the connection between a Content Creator/Consumer and a Value Set Consumer? How does the clinical application (Content Creator/Consumer) obtain the new codes or updated values from the Value Set Consumer?
• Should there be a VS Registry Stored Query or just a Query VS Registry
• Should there be a notification mechanism when a new version is available or should the Value Set Consumer do a query?
9. Tech Cmte Evaluation
<The technical committee will use this area to record details of the effort estimation, etc.>
Effort Evaluation (as a % of Tech Cmte Bandwidth):
- 35% for ...
Responses to Issues:
- See italics in Risk and Open Issue sections