Caveats for the Use of Operational Electronic Health Record Data in Comparative Effectiveness Research

From Clinfowiki
Jump to: navigation, search

This is a review of an article by Hersh et al. (2013) titled Caveats for the Use of Operational Electronic Health Record Data in Comparative Effectiveness Research. [1]


The adoption of EHR not only holds the promise to improve the quality, safety and cost of health care, it also has the potential to advance biomedical science and practice through the re-use of clinical data. There are federal programs that facilitate the re-use of clinical data for comparative effective research (CER) including Clinical and Translational Research Award (CTSA), and Strategic Health IT Advanced Research Projects (SHARP). Some successes come from the Electronic Medical Records and Genomics (eMERGE) Network, HMO Research Network’s Virtual Data Warehouse (VDW) project. However, the reuse of EHR data to advance clinical research can be challenging because HER data were not collected for research purpose. The authors of this paper described the caveats of using HER data for clinical research and also provide an informatics framework for better understanding of these caveats and recommendations for moving forward toward improving the re-use of EHR for research.


  • Caveat #1 EHRs may Contain inaccurate (or incorrect) Data

Error in EHR can be produced at any point of clinical process and documentation. Studies showed that there was tremendous diversity in data elements, study settings, populations, clinical conditions and EHR systems. Some studies found five categories of reasons for problematic data: missing data, erroneous data, un-interpretable data, inconsistent data and/or inaccessible data in text notes.

  • Caveat #2: EHRs often do not tell a complete patient story

Patients may get care in different healthcare organization or are lost to follow-up. Data recorded in a patient’s record at a single institution may be incomplete. Therefore, EHR data alone were not sufficient to answer the investigator’s questions. There are studies showed that data incompleteness and significant variability in the quality of data had led to biased outcome research conclusion.

  • Caveat #3: Many of the data have been transformed/coded for purposes other than research and clinical Care

Transformation or coding of data can make the data inaccessible. Errors or influences can be introduced in coding process for many reasons: inadequate or incomplete documentation, lack of access to information by clinician and/or coders, illegible writing, suboptimal training and experience of the coder, upcoding, inadequate review by the clinicians, changes in coding practices or even improvement in coding.

  • Caveat #4: Data captured in clinical notes (text) may not be recoverable for CER

Many clinical data cannot be retrieved from narrative text reports including information-rich sources of data. Although natural language processing is a promising approach for recovering data for research, the performance of NLP is far from perfect.

  • Caveat #5: EHRs may present multiple sources of data that affect data provenance

Data provenance is the understanding of the authoritative or definitive source(s) of a given measure or indicator of interest, given the existence of multiple potential sources for such a variable. It is concerned with establishing and systematically using a data management strategy that ensures that definitive findings are derived from multiple potential source data elements in a logical and reproducible manner.

  • Caveat #6: Data granularity in EHRs may not match the needs of CER

EHR data may not detailed and specific enough for CER. For example, diagnosis code for billing purpose is for a broad set of conditions rather than a specific subset, which is needed for comparative effectiveness research.

  • Caveat #7: There are differences between research protocols and clinical care

There are differences in methods and purposes between clinical care and research. The types of data idiosyncrasies for use of operational EHRs in CERs include diagnosis uncertainty, diagnosis timing, treatment choice and timing, treatment decisions and treatment follow-up.

Informatics framework for addressing caveats

The authors suggested thinking along a continuum in biomedical informatics that comprises data, information and knowledge to categorize the major issues of reusing EHR data for CER. They also suggested the promotion of policies for adoption of standard-based, interoperable healthcare data and the use of integrated data from multi sources. The authors hoped that the caveats they categorized will be paid attention so that EHR can be used for informing not only the health of individuals but also the function of the larger health care system.

My comments

The caveats they categorized would be helpful for researchers who will use EHRs data. I felt the informatics framework they proposed is too general to be practical. A more specific connection between those identified caveats and data-information-knowledge paradigm may be worth to investigate further for practical use.

Related Articles

EHR-enabled Research


  1. Hersh, W. R., Weiner, M. G., Embi, P. J., Logan, J. R., Payne, P. R., Bernstam, E. V., ... & Saltz, J. H. (2013). Caveats for the use of operational electronic health record data in comparative effectiveness research. Medical care, 51(8 Suppl 3), S30-7.