Factor analysis

From Clinfowiki
Jump to: navigation, search

Factor analysis is a statistical approach that can be used to analyze interrelationships and common underlying dimensions among a large number of variables. The inferred independent variables are called factors. This statistical approach condenses the information contained in a number of original variables into a smaller set of factors, with a minimum loss of information.

A typical factor analysis suggests answers to four major questions:

  1. How many different factors are needed to explain the pattern of relationships among these variables?
  2. What is the nature of those factors?
  3. How well do the hypothesized factors explain the observed data?
  4. How much purely random or unique variance does each observed variable include?

History

If a statistical method can have an embarrassing history, factor analyis is that method. Around 1950 the reputation of factor analysis suffered from over-promotion by a few overenthusiastic partisans. In retrospect there were three things wrong with the way some people were thinking about factor analysis at that time.

First, some people seemed to see factor analysis as the statistical method rather than a statistical method.

Second, they were thinking in absolute terms about problems for which a heuristic approach would have been more appropriate.

Third, they were thinking of overly broad sets of variables ("we want to understand all of human personality" rather than "we want to understand the nature of curiosity").

Thus in three different ways, they were attempting to stretch factor analysis farther than it was capable of going. In recent decades factor analysis seems to have found its rightful place as a family of methods which is useful for certain limited purposes.

Principal use

Many statistical methods are used to study the relation between independent and dependent variables. Factor analysis is different; it is used to study the patterns of relationship among many dependent variables, with the main goal of discovering something about the nature of the independent variables that affect them, even though those independent variables were not measured directly. Thus answers obtained by factor analysis are necessarily more hypothetical and tentative than is true when independent variables are observed directly.

Advantages

  • Identification of groups of inter-related variables, to see how they are related to each other.
  • Reduction of number of variables, by combining two or more variables into a single factor. For example, performance at running, ball throwing, batting, jumping and weight lifting could be combined into a single factor such as general athletic ability.

Shortcomings

  • "...each orientation is equally acceptable mathematically. But different factorial theories proved to differ as much in terms of the orientations of factorial axes for a given solution as in terms of anything else, so that model fitting did not prove to be useful in distinguishing among theories." (Sternberg, 1977). This means all rotations represent different underlying processes, but all rotations are equally valid outcomes of standard factor analysis optimization. Therefore, it is impossible to pick the proper rotation using factor analysis alone.
  • Factor analysis can be only as good as the data allows. In psychology, where researchers have to rely on more or less valid and reliable measures such as self-reports, this can be problematic.
  • Interpreting factor analysis is based on using a “heuristic”, which is a solution that is "convenient even if not absolutely true" (Richard B. Darlington). More than one interpretation can be made of the same data factored the same way, and factor analysis can not identify causality.

Examples in Informatics

Evaluation of computerized nursing care plan: Instrument development*1. Journal of Professional Nursing, Volume 20, Issue 4, Pages 230-238. T. Lee

With the increasingly popular use of information technology in patient care, the need for reliable instrumentation to evaluate information systems has become critical. This article describes the psychometric testing of a scale developed to evaluate a computerized nursing care plan (CNCP) system. A review of the literature generated a 44-item questionnaire, which was then administered to a convenience sample of 729 hospital nurses in Taiwan. Factor analysis (principal component analysis with varimax rotation) and item analysis were applied to establish the scale's construct validity and reliability. Twenty-two items selected from the original 44-item pool were grouped into 6 major constructs: patient care, nursing efficiency, professionalism, usage benefit, education and training, and usability. The α coefficient was 0.85. The statistical results showed that nurses generally valued using the CNCP system. Further psychometric analysis of the scale is suggested in other nursing populations, for subscale development and to refine item wording.

Rafal Kustra, Romy Shioda, Mu Zhu: A factor analysis model for functional genomics. BMC Bioinformatics 7: 216 (2006)

Abstract: Background: Expression array data are used to predict biological functions of uncharacterized genes by comparing their expression profiles to those of characterized genes. While biologically plausible, this is both statistically and computationally challenging. Typical approaches are computationally expensive and ignore correlations among expression profiles and functional categories.

Results: We propose a factor analysis model (FAM) for functional genomics and give a two-step algorithm, using genome-wide expression data for yeast and a subset of Gene-Ontology Biological Process functional annotations. We show that the predictive performance of our method is comparable to the current best approach while our total computation time was faster by a factor of 4000. We discuss the unique challenges in performance evaluation of algorithms used for genome-wide functions genomics. Finally, we discuss extensions to our method that can incorporate the inherent correlation structure of the functional categories to further improve predictive performance.

Conclusion: Our factor analysis model is a computationally efficient technique for functional genomics and provides a clear and unified statistical framework with potential for incorporating important gene ontology information to improve predictions.

References

  1. http://en.wikipedia.org/wiki/Factor_analysis
  2. http://www.psych.cornell.edu/Darlington/factor.htm