Science Tribune - Article - August 1996
National scientific endeavour from 1981 to 1992
E-mail : email@example.com.
This article is based on a multivariate analysis of the scientific publication output of 48 nations that has been performed by Jean-Christophe Doré‚ in the context of a collaboration with Jean-François Miquel and Yoshiko Okubo (see bibliography at the end of the article). The gist of the article is to show how a total of over 6 million entries in a bibliometric database (from the Institute of Scientific Information, Pa, USA) can be represented by a single tree-like diagram which illustrates how close (i.e. similar) the nations are to each other when one compares their publication profiles covering 18 disciplinary areas.
In a world where ever more widespread and faster communication systems are breaking down an increasing number of barriers between nations, how is it possible to obtain an instant snapshot of the scientific competitivity of all nations in the absence of established and validated indicators. A comparison of national economic input into science and technology (funds, trained personnel, laboratory space, etc) or of productivity output (patents, technological transfers, bibliographic references, etc...), measured in absolute values, is not as meaningful as one might expect because of the great differences in the size, population, and economic and political clout of nations. On the other hand, devising 'national signatures', which reflect the way in which nations distribute their overall scientific effort among various disciplines, might be a first step toward an objective overview of world science. This could then be used by historians of science and policy makers alike as a common discussion document to voice their opinions and proposals. To focus on individual national specificities without placing them in a worldwide perspective ignores information that may prove revealing. Let's get a bird's eye view !
In this article, I shall illustrate a simple way of drawing a picture of world science (excluding social studies and humanities). It is based on an analysis of a bibliometric database obtainable from the Institute for Scientific Information (ISI) (Philadelphia, USA) which covers scientific publications (articles, proceedings, notes and reviews) by over 200 world regions or institutions (mostly nations) in 18 disciplinary areas over a period of 12 years (1981-1992) (over 6 million entries). The multivariate method of data analysis was the minimum spanning tree method which links all nations into a single descriptive network on the basis of their overall similarities and dissimilarities in publication patterns. This will be explained in more detail below. A counterpart network linking all disciplines depending upon the way they are viewed by the different nations can also be constructed but will not be discussed.
Why choose publications as a criterion for the assessment of world science ?
The obvious answer to this question is that to analyse world science as objectively as possible, it is best to have hard data and that publication counts are some of the most systematic and comparable data that are readily available. Admittedly, many publications are not vectors of important scientific information but essentially instruments to enhance individual status. Furthermore, diffusion of knowledge is not equivalent to knowledge even though science is hardly ever performed in an intellectual vacuum. However, even if publications are not the most reliable gauge of quality or performance, the act of publication is in itself a telling tale. A publication, regardless of its intrinsic scientific value, reflects an interest in the discipline in question, a material capacity to perform research, a quality of intellect that characterizes the wider elite of a nation, and a wish to be known. Thus, overall, publication counts reflect the resources (natural resources, man-made infrastructures, qualified personnel), levels of investment (government and private), policy goals (official and unavowed), open-mindedness and culture of a nation. By establishing the publication pattern of a nation, we obtain a 'signature' of its scientific interests and capacities in various disciplines.
Advantages and shortcomings of the ISI database
All databases have idiosyncrasies and shortcomings and the ISI database is no exception. In anticipation of criticism, we shall list some of its shortcomings, but first let's not forget its advantages for drawing a picture of world science. It is the world's largest database of scientific publications and, unlike most other databases, covers a wide variety of scientific disciplines. It is a database of long-standing that has not undergone major changes. The fairly recent availability to the public of publication counts covering a time-span of 12 years (1981-1992) seemed particularly suited for a preliminary analysis of world science since, despite accelerating scientific competitiveness, national science policies take time to be formulated and even longer to be put into practice. It includes the so-called "top" 6% (about 3200) of all science journals.
By what right and on what basis can anyone judge which are the top national science journals ? ISI has chosen citation frequency as a criterion for the objective assessment of articles. This is a reasonable choice although the reasons for citing an article are diverse and do not always include quality. The criteria used by ISI to select its journals are less transparent and, at the very least, there seems to be a bias toward the English language at the possible expense of scientific content. Nevertheless, it is surprising and also somewhat disconcerting to note that the virtual monopoly exerted by ISI in bibliometry is endorsed by a majority of scientists and science policy makers, even to the extent that some 'foreign' nations use ISI-based criteria to evaluate their national research institutions. There is a tacit acceptance of the American value judgment and, apparently, little serious competition in elaborating databases outside the sphere of US hegemony. Does ISI overpromote American science, is American science really at the forefront, do nations wish to toe the American line ? To tackle these questions, an analysis of ISI's database should prove instructive.
With respect to the present study, the main technical shortcomings of ISI's database are that multinational cooperative research programs often give publication credit to all participating nations independently of their true individual contribution. This leads to a redundancy level of about 9%. A similar level of redundancy arises from the apparent allocation of some journals and/or articles by ISI to more than one disciplinary area.
Description of the database
This article deals with the 48 nations with the highest publication output, i.e., a total of 6,582,457 publications (97.9% of world output). Over the 1981-1992 period, the US was in the lead with a publication volume (2.3 million articles) equivalent to about a third of the total and that was five to ten times greater than that of its closest peers (UK, Japan, USSR, Germany, France and Canada). The UK (England, Scotland, Wales and Northern Ireland) was in second place. The North Atlantic trio - US, UK, Canada - together produced more than half of the world output. Germany and the USSR were on a par. The most prolific country in the South Pacific was Australia (10th position), on the African continent, South Africa (25th) and in South America, Brazil (26th).
Disciplinary fields in the ISI databank are defined by journal sets relating to each discipline. There are 17 specific disciplines (a) plus one multidisciplinary (MUL) field which covers those publications in Nature, Science, and in National Academies of Science proceedings that could not be assigned to any one of the 17 specific disciplines. Over the 1981-1992 period, the major discipline was clinical medicine (1.2 million articles; 18% of the total volume) followed by general biology & biochemistry, chemistry and physics (12-13% each) and then by plant & animal sciences, engineering, molecular biology and neurosciences (4-7% each).
It is an interesting exercice to group these disciplines into broad categories even though there is a degree of overlap among disciplines. Thus, in the '80s, publications in biology (36%) by far outweighed those in other categories. 'White' biology (biology & biochemistry, molecular biology, neurosciences, immunology) accounted for 24% and 'green' biology (plant & animal sciences, agricultural sciences, ecology & environment) for 12%. Medicine (clinical medicine and pharmacology) and physics (physics, engineering, materials sciences, astrophysics) each accounted for 22%, chemistry for 13%, mathematics (including computer sciences) for 3% and geosciences for 2.6%. The world was decidedly more interested in living than dead matter during this period.
Multivariate analysis of the database
To determine how the production of each of the 48 nations was spread across the 18 disciplines, we performed the following simple calculation. A nation's overall production of scientific papers from 1981 to 1992 was set at 100% and the percentage output per discipline was calculated. This gave a set of 48 publication profiles or 'signatures'. Taken together, these profiles form a matrix of 48 nations (rows) times 18 disciplines (columns) which defines a multidimensional space. Items that are close to each other within this space have similarities or are correlated in some way, those that are distant are different or atypical. It is possible to calculate all kinds of distances (using Chi2 metrics because publication counts are categorical variables) within this space, for example, the distances among nations, among disciplines, between nations and disciplines.
Distance from the centre of gravity of the multidimensional space
We shall start with one of the simplest examples, namely, the distance of each nation from the centre of gravity of the multidimensional space. But first of all, what does the centre of gravity of the space represent ? It is an artificial average world publication profile. And what does it mean if a nation is close to this centre ? It means that it has a publication profile that fairly closely resembles this world average. And what could be the reasons for this ? We suggest that the nations nearest to the centre have research policies, financial means, and human resources that allow traditional and cutting-edge research on many fronts and whose scientists are themselves innovative yet respond well to advances made elsewhere, i.e., they participate in a multi-nation scientific community with good communication facilities.
Over the global period 1981-1992, England and the Federal Republic of Germany were nearest to the centre with the USA, France, and the Benelux countries close on their heels. Among the 18 of the 48 nations or regions closest to the centre, 13 were West-European. The non-European countries were USA, Israel, Canada, Venezuela, Mexico (from nearest to furthest). If one leaves most of Europe and America aside, Japan was the closest country (19th position) with Australia just behind (22nd position). However, one has to reach the 27th position to encounter the first country of the former Eastern bloc, Yugoslavia. Eastern European countries are ranked in the order (from the most to the least typical): Yugoslavia, Democratic Republic of Germany, Hungary, Czechoslovakia, Poland, USSR, Romania, and, way out, Bulgaria. This order might reflect the stringency of state control of research programs and publications.
Although a 12-year time-span is rather too short to evaluate changes in publication trends on a national scale, we nevertheless calculated the distances of the nations from the centre of gravity at three specific points in time, in 1981, 1986, and 1991. There may be idosyncracies in the course of one particular year that are erased when considering time-spans but the following observations proved interesting. Basically, the same group of 17 or so countries remained at a considerable distance from the centre, well separated from the remaining countries. The only country that quit this group to join the remainder was Hungary which, as everyone knows, made substantial efforts to adopt the Western ethos during this time. Within the group, the only country to move substantially in the direction of the centre was Thailand. On the whole, most countries within the group remained in a fairly stable position; the movements of some (e.g. South Africa, Saudi Arabia) were erratic. As regards the countries outside this distant group, there was an overall general tendency to move in toward the centre of gravity. This would seem to suggest that there is a process of global uniformisation in national publication profiles and that the norm toward which nearly all nations tend is the one established by the countries already at the centre (England, Federal Republic of Germany, USA and France, depending upon the year). The nations which moved in most markedly were : Spain, Sweden, Japan, Ireland, Italy, and Argentina whereas both Venezuela and Norway moved away.
Distances between nations according to a minimum spanning tree analysis
To show up the distances among the nations in the multidimensional space without reference to a centre of gravity, we analyzed the Chi2-distance matrix by the minimum spanning tree method using Prim's algorithm. This is a pleasing way of linking items in a multidimensional space into a tree-like diagram. Prim, who worked at the Bell telephone company, calculated the shortest length of telephone cable that would interlink several towns without resorting to any loops and without any backtracking. When applied to our matrix, his algorithm gives the shortest tree-like network that links the 48 nations on the basis of the similarities in their publication patterns, i.e., according to their distribution of effort over the 18 disciplines (see Figure. N.B.: This figure is not fixed in a 2D-plane but has the flexibility of a Calder mobile). Nations with highly similar publication patterns are neighbours; nations with distinct patterns are at the end of long branches. The shorter the distance separating two nations, the broader the line between them. There are a myriad ways of linking 48 items into a branched network but this tree is the most economical. It thus describes best the ISI data and, insofar as the ISI database reflects a greater reality, gives an overview of national scientific endeavour in a world context.
Minimum spanning tree of nations on the basis of publication patterns
A personal interpretation of the minimum spanning tree of nations
To discover the rationale of the links in the tree, one must refer to the individual publication patterns of nations or, even better, to the results of a multivariate analysis that co-maps nations and disciplines. These have been published (see references). In the following discussion, we shall broadly associate different types of publication pattern with different branches of the tree.
Description of the minimum spanning tree
The nations connected by the broader links form a hard core of 19 countries or regions (see shading) in which the USA (4 links) and England (3 links) are the central cross-roads and the Federal Republic of Germany (5 links) is the biggest border-post. As indicated by the white circles, especially tight links exist between England and the USA, between the Netherlands and Belgium, and between three Nordic countries (Finland, Sweden and Denmark) that concentrate on life sciences and clinical medicine. The position of Norway, nearer to the UK regions and, in particular, to Wales, is due to a specific interest in disciplines to do with the land (ecology/environment, geosciences), no doubt because of its more extensive and varied natural resources.
The hard core of countries closely linked to the England-USA epicenter is formed of European Union members (top right-hand branch - I), Canada, Australia, Ireland and UK regions (middle branch - II), Austria and Scandinavia (left-hand branch - III) and Israel, an offshoot of the USA. Noteworthy features are the direct but relatively long-distance links of South Africa, New Zealand and Nigeria with Great Britain and the rest of the Commonwealth. This middle branch (II) seems to concern advanced nations with considerable natural resources or specific flora and fauna, combining interest in the agricultural sciences with plant and animal sciences, and under the dominant influence of Britain. In fact, the political, historical, and cultural links between several of the hard core nations are obvious.
Widening the circle to include nations at a greater distance (finer but not broken lines) embraces countries which are more or less directly linked to the ex-Federal Republic of Germany :
- Japan and Argentina,
- Southern Europe, i.e., Portugal and the Mediterranean countries (Spain, Yugoslavia, Greece and Turkey) with Spain moving most rapidly toward the centre of gravity of the system (see above).
- some Eastern European countries: ex-East Germany, Hungary, Czechoslovakia. Of these, Hungary was the fastest mover toward the centre.
Extending the circle even further (to include broken lines) embraces South East Asia and the rest of Eastern Europe. The Latin American countries are dispersed within the tree but are all weakly but preferentially linked to the hard core nations despite the historical influences of Spanish and Portuguese culture in South America. This could be explained by differences in the Mediterranean and South American biotopes leading to different approaches to land exploitation and to recent technological cooperation with other European countries.
What differentiates the publication patterns of the nations of the bottom half of the tree (IV to VIII) from those of the hard core nations ?
Basically, nations of the bottom half of the tree publish preferentially in the disciplines of the industrial revolution that deal with the make-up and function of matter, i.e., traditional chemistry and physics. They have not yet all come to grips with 'modern' life sciences, clinical medicine, the environment, or computer sciences. However, they split into subgroups depending upon whether they are evolving toward high-tech (as South Korea and Taiwan), toward the biological sciences (as some Eastern European countries), or remain decidedly attached to the land.
A country that has moved fast toward the hard-core nations is Japan. Since the 1980's Japan has sought active collaboration with the West in order to become competitive in the life sciences but without ever relinquishing its long-standing expertise in engineering and technology. The Asian 'dragons', Taiwan and South Korea, entered the international scene after Japan and, during the 1981-1992 period, their publication patterns, characterized by a preference for chemistry and physics, were closer to those of the ex-East European bloc than to those of the hard-core nations with the exception of a particular attraction for high-tech. South Korea has focused on materials sciences and, increasingly, on computer sciences. However, like the majority of growth-addicted societies, they do not show overdue concern with cleaning up the pollution generated by their technologies and health care will probably only become a demand to be met as the technological boom continues and life expectancy rises. This behaviour contrasts with that of Thailand, a country more oriented toward medicine, probably because of its special relationship with Sweden which is extremely active in this field.
The countries of the East European bloc are divided. The ex-Democratic Republic of Germany, Hungary and Czechoslovakia - the three members closest to the hard core of nations - showed greater interest in the biochemical and clinical fields as would be expected of countries with an established pharmaceuticals industry whereas Poland and Romania were much involved in chemistry and heavy industries. Polluted landscapes did not, however, inspire much activity as regards ecological and environmental issues.
The agricultural, plant and animal sciences were a major preoccupation of poorer countries often with desert lands (India, Egypt and Saudi Arabia), as they were of the Commonwealth and UK regions, but the distinction lies in the lesser interest of the former in biochemistry and clinical medicine.
Greece still focusses on the well-established basic and applied sciences instead of the emerging life science disciplines. It displays interest in the international world of publication in ecology/environment probably because of the industrial pollution that has attacked its national architectural heritage and that is rapidly spreading in the Mediterranean.
The above broad interpretations, although based on the individual publication patterns of the nations, should not be taken as hard and fast rules. Although a nation can form part of a group with typical features, it never relinquishes its individuality. However, I would like to stress the following points :
(a) the figure is an objective mathematical representation of the content of the ISI database, however subjective the interpretations may be,
(b) it has an uncanny tendency to reflect the economical, technological and cultural links between nations. It is, at the same time, an illustration of historical links and of a North-South divide,
(c) if one refers to the individual publication patterns, one realizes :
- how much geography contributes to publication patterns (e.g., sciences relating to the land surface and primary sector of the economy (plant and mineral kingdoms) and geosciences (the sea, earth's mantle and mining),
- how the countries that still preferentially indulge in agricultural sciences and in the sciences of the industrial revolution (chemistry, physics) - i.e., the land and transformation industries - concentrate relatively less on clinical medicine and the environment. Publication profiles between 1981 and 1992 thus reflect the time-lags in the preoccupations of the less versus more developed countries.
Historians of science and research policy makers may have further comments and opinions to offer on the above minimum spanning tree analysis.
a. List of 17 scientific disciplines (alphabetical order) : Agricultural sciences, astrophysics, biology & biochemistry, clinical medicine, computer sciences, chemistry, ecology & environment, engineering, geosciences, immunology, materials sciences, mathematics, molecular biology, neurosciences, pharmacology, physics, plant & animal sciences.
Miquel JF, Okubo Y. Structure of international collaboration in science: Part II: Comparisons of profiles in countries using a LINK indicator. Scientometrics 29, 271-297, 1994.
Ojasoo T, Doré‚ JC, Miquel JF. Regional cinderellas. Nature 370, 172, 1994..
Miquel JF, Ojasoo T, Okubo Y, Paul A, Doré‚ JC. World science in 18 disciplinary areas: Comparative evaluation of the publication patterns of 48 countries over the period 1981-1992. Scientometrics 33, 149-167, 1995.
Doré‚ JC, Ojasoo T, Okubo Y, Durand T, Dudognon G, Miquel JF. Correspondence factorial analysis of the publication patterns of 48 countries over the period 1981-1992. J Am Soc Inform Sci 47, 588-602, 1996.
Garfield E. How do we select journals for Current Contents ? Current Contents Nov. 5, 1979
Carpenter M, Narin F. The adequacy of the Science Citation Index (SCI) as an indicator of international scientific activity J Am Soc Inform Sci 32, 430-439, 1981.
Moed HF, Burger WJM, Frankfort JG, van Raan AF. On the measurement of research performance: The use of bibliometric indicators. Center for Science and Technology Studies, University of Leiden, Leiden, 1983.
Callon M, Law J, Rip A (Eds). Mapping the dynamics of science and technology. Macmillan Press, London, 1986.
Anderson J, Collins PMD, Irvine J, Isard PA, Martin BR, Narin F, Stevens F. On-line approaches to measuring national scientific output: A cautionary tale. Science & Public Policy 15(3), 153-161, 1989.
Hamilton DP. Publishing by - and for? - the numbers (News and Comment). Science 250, 1331-1332, 1990.
Martin BR. The bibliometric assessment of UK scientific performance. A reply to Braun, Glänzel and Schubert. Scientometrics 20, 333-357, 1991.
Ernst E, Kiefenbacher T. Chauvinism. Nature 352, 560, 1991.
Pendlebury D. Research papers: Who's uncited now ? Science 251, 25, 1991.