I2b2 Informatics for Integrating Biology and the Bedside
i2b2 Informatics for Integrating Biology and the Bedside - National Center for Biologic Computing
i2b2 was formed in response to a 2004 NIH Roadmap Initiative RFA that was intended to encourage high-risk, transformative research that would address areas of incomplete knowledge in a collaborative fashion, within academic medical centers. The i2b2 program was developed by Partner’s Health System in Boston, and became one of four NIH National Centers for Biologic Computing (NCBC) in 2004. (There are now seven NCBC Centers.) The i2b2 Center includes investigators from Harvard-affiliated hospitals, MIT, Harvard School of Public Health, Harvard Medical School, and the Harvard/MIT Division of Health Sciences and Technology. i2b2
The i2b2 Center harvests anonymous data from 2.5 million private medical records from patients within the Partner’s Healthcare System in Boston. It uses these data in five core areas which include:
Core 1: Creating new computational paradigms.
Core 2: Driving Biologic Projects
Core 3: Creating new methodologies/ software development.
Core 4: Training program development through the i2b2 Center
Core 5: Dissemination of i2b2 software tools.
i2b2 Driving Biologic Projects (Core 2)
Driving Biologic Projects (DBP) includes several disease-based projects that are formed based on criteria such as: the clinical significance of the disease, a research model for the disease that can be used in research incorporating genome technologies combined with clinical data types, and projects directed towards complex diseases with evidence of genetic components.
Examples of past Driving Biology Projects include:
Airway Diseases – genetics and pharmacogenetics – DBP
Methodologies and issues addressed included the use of Natural Language Processing (NLP) to identify different subcategories or “subphenotypes” of asthma from a database of 97,000 asthmatics and to identify and recruit subjects for DNA collection for genotyping. (1) The project developed epidemiologic methods to detect phenotypes of asthma based on disease severity and then planned on applying bioinformatics methods such as searching for single nucleotide polymorphisms that can predict specific phenotypes such as severity of disease. (2,3)
Huntington’s Disease DBP
The goal of this DBP was to identify gene expression phenotypes important in the modification or treatment of the disease.
Tools for a genomic Hive Cell Suite (a plug and play architecture) that aids in using existing data repositories to aid discoveries in diseases with known genetic components.software_tools
HD CAG repeats ( a series of triple base pair repeats forming a “glutamate tract” known to play a role in HD progression) alters extra-mitochondrial energy homeostasis pathways. (4) Determination that (RNA) “expression signatures” (profiles) vary by the length of CAG repeats in white blood cells. (5,6)
Current Driving Biology Projects (DBPs) include:
Autoimmune/ CV Diseases autoimmune_cv
Diabetes/ CV Diseases
i2b2 Computational Tools (Core 3,5)
i2b2 provides a series of computational tools. Computational_Tools
i2b2 NLP Research Data Sets (Core 4)
i2b2 provides research dataset access to outside investigators. Datasets
The latest i2b2 software version is 1.7. Software
- Recognition and Evaluation of Clinical Section Headings in Clinical Documents Using Token-Based Formulation with Conditional Random Fields (usage of i2b2 in paper)
1. Zeng QT, Goryachev S, Weiss ST, Sordo M, Murphy SN, Lazarus R. Extracting principal diagnosis, co-morbidity and smoking status for asthma research: evaluation of a natural language processing system. BMC Medical Informatics and Decision Making. 2006 6:30.
2. Hlmes BE, Kohane IS, Ramoni MF, Weiss ST. Characterization of patients who suffer asthma exacerbations using data extracted from electronic medical records. AMIA Annu Symp Proc. 2008 Nov 6; 308-12. PMID:18999057.
3. Himes BE, Day Y, Kohane IS, Weiss ST, Ramoni MF. Prediction of Chronic Obstructive Pulmonary Disease (COPD) in asthma patients using electronic medical records. J Am Med Inform Assoc. 2009;308-12.
4. Lee JM, Ivanova EV, Keongi IS, Cashorali T, Kohane I, Gusella J and MacDonald M. Unbiased gene expression analysis implicates the huntingtin polyglutamine tract in extra-mitochondrial energy metabolism. PLoS Genet. 2007;3(8):e135. PMID:17708681.
5. Gusella JF, MacDonald M. Genetic criteria for Huntington's Disease pathogenesis. Brain Res Bull. 2007;72:78-82. PMID:17352930.
6. Jacobsen JC, Gregory GC, Wode JM, Thompson MN, Coser KR, Murthy V, Kohane IS, Gusella JF, Seong IS, MacDonald ME, Shioda T, Lee JM> HD CAG-correlated gene expression changes support a simple dominant gain of function. Hum Mol Genet. 2011;20(14):2846-60. Epub 2011 May 2. PMID:21536587.