Use of free text clinical records in identifying syndromes and analyzing health data
Use of free text clinical records in identifying syndromes and analyzing health data
K. Lam, T. Parkin, C. Riggs, K. Morgan
Veterinary Record 161, 547-551
October 20, 2007
As computerized medical records become more widespread, debate rages over the method of entering data into these records. Coding is used to identify diseases and clinical signs, and the resulting codes are easily incorporated into databases that form a valuable resource for epidemiological research. However, computerized record systems also allow clinicians to include data in a free text form, and this data, despite representing an enormous amount of information, is not generally useful analytically. Natural language processing methods have been successfully developed to extract information from journal articles, but they are less useful in analyzing clinical language in a patient record, as such language is much more colloquial in nature. The authors propose using content analysis, instead, to systematically study free text from clinical records.
Using Provalis Research’s WordStat software to perform a content analysis of free text computerized veterinary records, the authors identified and classified the reasons 3727 thoroughbred horses belonging to the Hong Kong Jockey Club were retired from racing. After identifying a dictionary of retirement reasons, they were able to assign 95% of the records in five cycles through their categorization process (63% of the total in the first cycle). The remaining 5% were assigned manually, due to either rare medical conditions or spelling errors in the text that were not understood by the software analysis. They also manually verified each record’s categorization, so as to validate their automated procedure. The entire process encompassed approximately 100 man-hours.
The authors designed this study to demonstrate the successful use of content analysis in systematically describing the vast amount of free text information contained in computerized medical records. Automated classification of this data, the amount of which precludes manual analysis, promotes its use in epidemiological initiatives such as analyzing public health data or targeting areas for further research.
Note: The authors report a very high satisfaction with the WordStat software.
Cecelia Madison
OHSU BMI 512 Winter 2008