The Cancer Journal - Volume 9, Number 3 (May-June 1996)
Some thoughts on correspondence factor analysis
The validity of results obtained in clinical research is usually based on the statistical analysis of data. Simple protocols use tests for comparing means or frequencies and, less frequently, seek correlations among variables. The most elaborate protocols use methods of multivariate analysis such as principal components analysis, multiple regression, Cox's proportional hazard method or discriminant analysis. Although all these methods are necessary and useful, they have their limits that other methods can override.
A paper in this issue of the Journal has employed correspondence factor analysis (CFA) and we shall therefore attempt to explain the interest of this method for the analysis of clinical, technical and therapeutic data. CFA is a mathematical tool that was developed some time ago but that has been, so far, little used in medicine.
In what ways is CFA different from other methods of data analysis?
1 - Readers familiar with the use of data bases know that many individuals and many items describing each individual (signs, decisions, results ... ) can be introduced into a data base and that, afterwards, both numerical or non-numerical data can be classified in any number of ways by category of individual or category of item. The two-way entry table of a CFA can include as many individuals - one per row - and as many items - one per column - as desirable.
A single CFA table can include :
- both healthy individuals and patients;
- one and the same individual in several rows, one row per stage of disease progression;
- individuals with different diagnoses and/or treatments or individuals with no firm diagnosis.
2 - Each new individual added to the core data base can be submitted to a personalized analysis with results that will benefit him or her directly as regards the establishment of a diagnosis category and the formulation of analytical, prognostic and therapeutic decisions. The greater the number of individuals in the data base and the fewer the number of missing values, the greater its diversity, utility and operational value. In this respect, the data base operates like the inference motor? of an artificial intelligence network.
3 - It is clear that, because CFA can embrace patients belonging to different disease categories within a single analytical procedure , it is a perfect tool for exploring co-morbidity. In present day medicine, the ageing of the population means that the number of patients presenting several diseases simultaneously is on the increase. An interdisciplinary data base, built without regard for the division of pathology into specialities, permits the comparison not of characteristics alone but of profiles of characteristics.
During the seconds that the computer handles the data, CFA reproduces a process that is partially analogous to the invention of anatomo-clinical medicine, i.e., the grouping together of signs into syndromes or diseases. The profile of each new individual is added to existing profiles thereby slightly modifying the structure of the whole system and also enhancing its validity. In the same way, each patient, as long as his or her case is adequately described in the literature, increases the body of medical knowledge.
4 - With CFA, one can form :
- homogenous groups of patients, on the basis of chosen characteristics, in line with the traditional accepted classification or in anticipation of new modes of classification;
- groups of characteristics in terms of their resemblances and interconnections;
- mixed groups of individuals and characteristics that could be termed 'explained neighbourhoods' where the characteristics embraced by the group explain the individuals contained within it.
Each patient retains his individuality and can be followed by CFA throughout the course of his illness. This is reflected in a trajectory on the CFA maps. As in traditional medicine but with the added advantage of personalisation, the physician can, at any time and disease stage, make subtle adjustments to the treatment, correct the trajectory and lead the patient towards the chosen goal that is cure or, in its absence, improvement.
Because it is largely free of the constraints imposed by an a priori hypothesis (as regards the distribution [normality] of variables, the relationships among them and the disease classification system), CFA can function as a generator of hypotheses and, in this respect, resembles neuron networks or artificial intelligence models. CFA is not, however, a substitute for the intellectual effort required of the physician; nor can it, or for that matter any other method, divert him of the responsibility for his decisions.
Medicine belongs to the world of inexact sciences. Often disappointed by the decision aids promised by overtriumphant computer experts and statisticians, practitioners have the right to demand analytical methods that are adapted to the fuzzy logic of their most excellent peers. Of course, CFA cannot be the universal answer to the study of the complex phenomena - determined but unpredictable - that characterize humans. Humans are organisms living in a complex physico-chemical environment rendered even more complex by the ubiquitous presence of other living organisms and of cultural elements inextricably intermingled around and within each of them.
Modern medicine requires :
- a rejection of binary 'yes/no' type of behaviour which characterizes, amongst others, the most sectarian proponents of molecular biology whose tenants today belong to normative genetics;
- a re-evaluation of the imprecise imbricated frontier between the normal and the pathological;
- mastering complexity by reducing the dimensionality of phenomena and not by isolating individuals or items from their context;
- an economy of information obtained by identifying dependent information that constitutes background noise and the production and analysis of which cost time and money (a present day version of the Occam principle);
- a relationship between practitioners and technical decision aids that is logical :
- exploration of the field and determination of its limits,
- varied methods of practice that exclude none since a priori none is without interest although none is universal,
- mastery of technique and control over a responsible decision by the practitioner even when this decision is efficiently computer aided because computers are machines to be held in slavery.
Jean-Yves Bansard, Michel Kerbaol, Jean-Claude Salomon$
Doré J-C. Mastering the complexity of information in patient health care. Science Tribune, http://www.tribunes.com/tribune/art96/erod.htm