Genes, Peoples and Languages

	Genes, Peoples and Languages Genetic Distance and Language Affinities Between Autochthonous Human Populations Genetic Distance Between Autochthonous Human Populations Language Affinities Beween Autochthonous Populations Language Affinities Human Evolution The Semitic and Other Afroasiatic Languages World Languages By World Area By Language Size By Language Family By Country By Diversity Genetic Distance and Language Affinities Between Autochthonous Human Populations The first two of the following tables was originally drawn from one article in Scientific American, "Genes, peoples and languages," by Luigi Luca Cavalli-Sforza (November, 1991). This reported the results of genetic mapping of human DNA affinities, the newest theories about larger families of human languages, and a comparison between the two. While this was a thrilling prospect for understanding the early history of languages and the initial dispersal of the human species across the world, some think it is too speculative and questionable for us to have great confidence in the results. Nevertheless, Cavalli-Sforza developed the material in a recent book, Genes, Peoples, and Languages [University of California Press, 2000]. Other recent information on genetic mapping can be found in The Seven Daughers of Eve, by Bryan Sykes [W.W. Norton & Company, 2001]. Since my graduate studies in the 1970's, which included some lingustics, like a course in historical linguistics, using W.P. Lehmann's Historical Linguistics: an Introduction [Holt, Rinehart, and Winston, 1962], I had been out of touch with recent developments. Seeing Cavalli-Sforza's article, the idea that genetic mapping and the higher-order grouping of language families would be logical developments of previous knowledge did not seem so outrageous. It still doesn't. The genetic mapping may still be reasonable enough, though I understand that there are some criticisms. On the linguistic side, however, I now find a very thorough and sensible discussion of the matter in R.L. Trask's Historical Linguistics [Arnold & Oxford University Press, 1996, 1998], "Very remote relations," pp. 376-404. Cavalli-Sforza's table matching linguistic and genetic affinities, like the first two tables here, is included (p. 401). We also have the third edition of Lehmann's book [Winfred P. Lehmann, Historical Linguistics, Routledge, 1992, 1997]. According to Trask, there continues to be little support among those in linguistics for most of the higher order groupings of human languages. Indeed, there is still relatively little support for Joseph Greenberg's proposal that most of the indigenous languages of the New World belong to a single language family, divided (as indicated below) into three branches. (Opinion is even still divided on the grouping of Japanese and Korean with the Altaic languages, there are are no other suggested affinities that I know of.) Trask is fairly respectful of the theory of the "Nostratic" languages, but the basis of most of the other theories ("Eurasiatic," "Sino-Caucasian," "Austric," and "Indo-Pacific") is in statistical comparisons that may not demonstrate any stronger relationship than what can be expected from random and coincidental similarities (or borrowings). For instance, the Persian word for "bad" is bad, just like in English, pronounced with a slightly different quality, but with a vowel very much like the English one. But this is a complete coincidence. There is no other evidence or indication of a common origin (or a borrowing) for these words (though, otherwise, there is plenty of evidence that English and Persian are ultimately related as Indo-European languages). What is coincidental and what is evidence of affinity becomes a mathematical question, and many of the theorists may be unsophisticated enough in statistics to be unable to distinguish between the random similarities that can be expected and the systematic similarities that are evidence of common origins. The natural occurrence of linguistic change, which is rather like genetic drift in human DNA, may, over a few thousands years, as Trask says (p. 376), obliterate all evidence of origins. The hope for future understanding, however, may be from just such statistical methods as are already used to separate, for audio and images, the "noise" that accumulates from the "signal" that remains. How even to apply those methods, however, is a problem all by itself. Unlike genetic affinities, where the elements of comparison are just the chemical constituents of the DNA, RNA, or proteins, in languages prior decisions must be made about the words or grammatical structures to be compared. Phonetically similar words, for instance, with entirely unrelated meanings will not qualify in the first place for comparison, since it is not just the words but the meanings that are supposed to have a common origin. In the following charts, the genetic one may be still be taken, with some reservations, as being rather like what we should expect. The linguistics charts can also be taken as reasonably accurate for everything except the groupings, especially the higher order ones, that are marked with question marks. Even these groupings are suggestive, since the idea that most or even all human languages have a common origin is entirely reasonable in itself; and, once the techniques are developed for tackling the evidence, something like these results may emerge. On the other hand, the groupings of the subdivisions of Indo-European languages also seems to be rather more of an open question than some of the articles in Scientific American have suggested. The divisions shown both for Indo-European and Afro-Asiatic languages should be taken as tentative. The ultimate relations of Egyptian are especially problematic because most of the languages to which it may be closely related, the old "Hamitic" group, are only known from their modern versions, while Egyptian is attested from 3000 BC. The "human evolution" chart, given fourth below, is entirely separate from the others. Its uncertainties are those of an entirely different discipline from linguistics, or even from genetics. It is in physical anthropology where the arguments over the number of human species and their affinities are to be found. Since the ideas there seem to change more or less with the tides, the chart here may be taken as, as Plato says, a "likely story." Genetic Distance Between Autochthonous Human Populations "Genetic drift" is the phenomenon by which small mutations in DNA (or RNA, and the proteins that are coded by them) add random variations over time to genetic material, resulting in differences between isolated groups of animals, whether of different species or of the same species that are not inter-breeding. Thus there is about a 5% difference in DNA between humans and our closest Primate relations, Chimpanzees. Between human populations, there is always much less than 0.1% difference. What these differences are and how much they vary can be used to construct a tree showing the relationships between human populations. The following tree is the result of such research, reported by Luigi Luca Cavalli-Sforza. The numbers across the top show the percentage difference in DNA, which is thus no more than 0.03% for all human beings. The most dramatic characteristic of the tree is the division between populations in sub-Saharan Africa and the rest of humanity. This is usually interpreted to mean that modern humans originated in Africa and that the population from which the rest of humanity descended left Africa somewhat less than 300,000 years ago, ultimately replacing earlier humans, like the Pithecanthropines (Homo erectus, like Peking Man, etc.), who had also evolved in Africa but left many thousands of years earlier. Part of this research was the theory of "Eve," a single female in Africa, around 200,000 years ago, from whom every living human being is now descended. This does not mean that there were not other human females -- there were -- or that we are not descended from them too -- we are. The theory is based on the circumstance that some human genetic material is contained in the mitochondria, little organs in a cell outside the nucleus (where most genetic material is contained). Sperm cells do not pass on their mitochondria to a fertilzed egg and so all human mitochondrial DNA is inherited from the mother. If a woman has only sons (a highly desirable result in many traditional cultures), then her own mitochondrial DNA is actually lost. Over time, this seems to have happened to all lines of descent of mitochrondiral DNA, except one, the line from "Eve." Another interesting feature of the chart is the closeness of American Indians to modern population across Europe, the Middle East, and northern East Asia. Thus, curiously, Europeans are more closely related to American Indians than to Polynesians. Finally, it is noteworthy that skin color is not at all helpful in providing clues to genetic affinity. The darkest colored people on earth, in Africa, India, Melanesia, and Australia, are scattered between groups that are only distantly related. Dark skin color is certainly a function of living under the equatorial sun for many generations, but all human populations have the genetic wherewithal to make that adaptation. The March 29, 1999, Newsweek reports (p. 72) that population geneticist Jody Hey and anthropologist Eugene Harris, of Rutgers University, reported in the Proceedings of the National Academy of Sciences that, using DNA techniques again, the African population split from the non-African about 189,000 years ago. The article presents this as well before the emergence of "modern" Homo sapiens and somewhat surprising, but it actually seems fairly consistent with the numbers presented above. If Home sapiens goes back 300,000 years and "Eve" is around 200,000 years ago, then it is not beyond the bounds of crediblity that we could get the basic split in the populations not too long after that. The margin of error is also probably pretty large. Language Affinities Beween Autochthonous Populations The second tree below essentially takes the first one but draws the tree over again using language rather than genetic affinities. What is of interest are the similarities to the first tree, indicating that human languages, which certainly antedate the 300,000 year mark (see Derek Bickerton, Language and Species [University of Chicago Press, 1990]), may also have a common origin in Africa itself. Many of the higher order groupings, however, as discussed above, are rather speculative. The theory of the "Nostratic" languages, which combines Afro-Asiatic (Hamito-Semitic), Indo-European, Ural-Altaic, Dravidian, and American Indian languages, is really the most dramatic but also may have the most credible evidence in common vocabulary items and systematic phonetic relationships. The grouping of Chinese with Basque, which otherwise seems unrelated to any other languages, seems more than a little bizarre but, if true, would be evidence of population movements and distribution prior to the early historical presence of Indo-European speakers across northern Europe and Asia. I have never seen explanations of the actual evidence for the Basque-Chinese connection. I have recently updated the "Afro-Asiatic" part of this tree, after some complaint from correspondents that it was not accurate enough. The treatment of based on the third edition of Lehmann's Historical Linguistics (p.84). Language Affinities The final tree takes the previous one and untangles the language families, filling in some extra detail. Some information on the Indo-European system may be examined at "Knowing" Words in Indo-European Languages. The table at Languages with more than 30,000,000 Speakers as of 1993 may also be of interest. A 1999 correspondent informed me that Nostratic is (or should be) viewed as the parent rather than the derivative of Euasiatic. I find this shown in Steven Pinker's Words and Rules [Basic Books, 1999], p. 212. This is also what Cavalli-Saforza has in Genes, Peoples, and Languages [2000], though the hypothetical nature of the groupings is strongly indicated. However, in Trask's Historical Lingusitics (p.401), Nostratic and Eurasiasic "superphyla" are shown overlapping, as though they could be competing groupings rather than structures that take each other into account. Given the uncertainty of the enterprise, it is unclear to me how the matter really should be construed. Thus the following table is rearranged to take make Nostratic the larger category, trying to preserve the other relationships, as with the controversial but daring connection to American Indian languages and the "Basque-Caucasian" group -- while the table above is left with Eurasiatic as the larger grouping. Pinker's diagram groups Nostratic with Sino-Tibetan and with a "New Guinea" group. This super-family is not given a name, and it is not clear if the New Guinea group is the same as the "Indo-Pacific" one above. The theory that most New World languages belong to the same language family is particularly associated with Joseph Greenberg. This theory is not in general favor, and the evidence for it was certainly tentative in the first place. Apart from Greenberg, however, there is no indication of what the affinities of most New World languages might be. It may just be that, like the large scale structure of Old World languages, the connections are lost in the noise of the Millennia. Human Evolution The final chart shows one view of human evolution, with various species of genus Australopithecus back in the Pliocene, leading to three species of genus Homo in the Pleistocene. Homo erectus had already spread into Asia, as far away as China and Java, but the archaic form of Homo sapiens, as seen in the genetic tree above, spread into Asia all over again. The blue names and lines are for the glacial episodes that characterize the Pleistocene. The names given are, first, for North American (e.g. Wisconsinan), second, for Europe (e.g. Würm). The scale at left is the logarithm of the year, which compresses the Pliocene, even though it was actually longer than the Pleistocene. Views about human evolution are constantly under revision, as they have been for decades. No great reliance should be placed on any of them. Recently several new fossils have been found, multiplying the number of species again. The new taxa may or may not be permanent. One of the longest running arguments is over the Neanderthals (i.e. "New Man Valley"), whether they are in the main line of human evolution or form a doomed offshoot among the glaciers of Europe. The "doomed offshoot" view may be the most common now, though not long ago there was a claim that distinctive Neanderthal tooth characteristics still exist in European populations, and there is considerable evidence of intermediate forms, though these are often ascribed merely to crossbreeding. The argument is likely to go a few more rounds before, if ever, being settled. Nevertheless, now there is an approach that may put the matter beyond doubt. Genetic material can be recovered from Neanderthal bones, and Bryan Sykes (The Seven Daughters of Eve [W.W. Norton & Company, 2001]), says that, on the basis of the DNA evidence, no modern Europeans have turned up who genetically are descendants of Neanderthals [pp.125-126]. It is hard for a layman to argue with such an approach, and the "doomed offshoot" view now may be said to be credibly affirmed -- except for every more recent claims about intermarriage of Neanderthals with modern humans in Iberia. I am not sure what kind of genetic evidence there is for that. While it already became common back in the '60's to at least put Neanderthals in the same species as modern human (Homo sapiens), Time magazine recently (August 23, 1999, pp.50-59) was listing them under one or two other species (H. heidelbergensis or H. neanderthalensis). The earliest fossil humans found thus continue to suffer from questions about whether they even are even truly human. We may also be seeing the effects of a personality typological division between "lumpers" and "splitters," i.e. those who like to consolidate categories and those who like to multiply them. Where the evidence underdetermines the theory, such preferences, like those for cyclical or linear time (cf. Stephen Jay Gould's Time's Arrow, Time's Cycle), strongly influence the view taken. The genetic evidence, again, can provide a way out of this, since a percentage threshhold of common DNA can be stipulated for speciation. If this principle has been used for the assignment of Neanderthals, I am not yet aware of it. The January 2000 Scientific American has an article, "Once We Were Not Alone" (p.56), that confirms developments and tendencies evident in the Time article. Neanderthals are in two different species again, and a new species of Homo (H. rudolfensis) has been added, a bit older than previous members of the genera -- H. habilis (as previously with A. afarensis) was for some time seen as the oldest member of its genus. Most striking, however, is the demotion of some Australopithecines from that status. A. robustus and A. boisei are now put in a new genus, Paranthopus, along with a new "robust" (i.e. big) species, P. aethopicus. The splitters have evidently been busy lately, breaking off Neanderthals and breaking up the Australopithecines. Below I have reproducted a more recent [2005] tree of human evolution from the Smithsonian Intimate Guide to Human Origins, by Carl Zimmer [Smithsonian Books, p.41]. This is not on a logarithmic scale, so the divisions of time are equally spaced. Here the late Australopithecines have been broken off into a new genus, Paranthropus. We see a recent discovery, Homo floresiensis, which has been called a species of "Hobbits" because of their size. Older hominids, more than 4.5 million years old are also included. The thick lines represent the periods for which fossils have been found. The thin lines connecting the lineages are speculative. The diagram apparently represents the view that most of the fossils we possess are from parallel descent and evolutionary "dead ends" in relation to modern humans. This may be a preference not unlike the difference between "lumpers" and "splitters," i.e. to be contrasted with those who suppose that some line represents direct ancestors, both for us and for some earlier form. The classic example of such a dispute may be that between those who think that birds descend directly from dinosaurs and those who have some confidence that the common ancestor lies in the undiscovered fossils of older reptiles. When I took Anthropology as a Freshman in 1967, the professor used a book whose author figured that Australopithecines -- not many were known at the time -- evolved directly into Homo erectus, Homo erectus into Neanderthals, and Neanderthals into modern humans. This was an extreme version of the "direct lineage" preference and hasn't help up well either in light of the discovery of the coexistence of many forms or in terms of the current Zeigeist for analyzing evolution. The "branching lineage" preference may account for the notion becoming settled that Neanderthals are a different species from H. sapiens, reinforced by some genetic results that Neanderthal DNA is not ancestral. The Semitic and Other Afroasiatic Languages Some of the oldest attested languages in the world, from the oldest civilizations, are in the family of the Afroasiatic languages. The oldest in the group is Ancient Egyptian, which is known from one of the earliest writing systems, hieroglyphics. All the other other languages here that are attested from ancient times are in the Semitic sub-family. The oldest of these is Akkadian, which evolved into the closely related Babylonian and Assyrian languages. The writing system of Akkadian, however, cuneiform, was not created by the speakers of that language, but by the speakers of the unrelated Sumerian. Akkadian came to prominence and, indeed, dominance with the kingdom of Sargon of Akkad. Sumerian appears to have all but died out as a spoken language by the end of the III Dynasty of Ur, c.2000 BC. Egyptian itself died out as a spoken language as recently as the 17th century AD, under the influence of Arabic and Islâm. Nevertheless, the latest form of Egyptian, Coptic, survives as the liturgical language of the Coptic Church. On the other hand, all the forms of Akkadian had died out by Late Antiquity. Egyptian is not closely related to the Semitic languages, but its other affinities are unclear. The other groups of Afroasiatic languages, Cushitic, etc., which used to be grouped together with Egyptian as the Hamitic languages, are only recently attested. Their ancient antecedents and the nature of their relationship to the rest of the language family are unknown. They now appear to be as distant from each other as from the Semitic languages, and the Hamitic category is no longer regarded as phylogenetically useful. An interesting comparison is between the verb systems of Egyptian and Semitic languages. Most Semitic languages, like Hebrew and Arabic, have two verb tenses, with prefixes for an imperfect and suffixes for a perfect. These express temporal aspect more than tense, i.e. incomplete action, in present or future, for the imperfect, and complete action, whether in present or past, for the perfect. Egyptian retains these forms, but they are little used, mostly replaced by participles with pronominal suffixes. On the other hand, the Eastern or Akkadian branch of the Semitic languages has three verbal tenses. The suffixed form is a stative, expressing states, while the "preterite" and "present" (perfect and imperfect) are both prefixed inflections. This by itself might be a clue that Egyptian is more closely related to the Western Semitic languages than to the Eastern; but it is only one indication among many, and otherwise there are many differences between Egyptian and all Semitic languages. The other writing system indicated on the chart is the alphabet developed among the Canaanite group of languages. This is the first alphabet, and forms of it, often by way of Phoenican, eventually passed to Aramaic, Greek, Arabic, Old South Arabian, Ethiopic, Middle Persian and even to India as the Brahmi script. After some letters were adapted as vowels in Greek, we get other derived alphabets with vowels: Latin, Armenian, Georgian, and Cyrllic. Latin especially now writes many modern languages, often completely unrelated to European or Middle Eastern languages (e.g. Chinese). A cuneiform alphabet was used at Ugarit but then did not spread. The homeland or place of origin of the Semitic languages is uncertain. There does not seem to be evidence of languages from any other language families from northern Syria all the way down to Yemen. In Mesopotamia there is evidence of successive encroachments of Semitic speakers, starting with Akkadian but then continuing with Amorite and Aramaic speakers. Most of the mystery is exactly where these would have originated. If Amorite is a Canaanite language, it may simply have spun off from the historic homelands of Phoencian and Hebrew. Aramaic, however, then seems to drop out of nowhere. For a long time, I think opinion was that it may have come out of the Peninsula, as Arabic certainly did later. That the Chaldeans appeared in southern Mesopotamia may be evidence of that. But now there is some question that the Chaldeans were even Aramaeans, as previously assumed -- though there is certainly no evidence that they were anything else. So the matter is rather up in the air. The structure of the Afroasiatic family of languages here is based on the treatment of Robert Hertzron, "Afroasiatic Languages," in The World's Major Languages, edited by Bernard Comrie [Oxford University Press, 1987, p.647-663]. The treatment of the Chadic languages is from Paul Newman, "Hausa and the Chadic Languages," also in Comrie. The treatment of Aramaic and Syriac is based on W.M. Thackston, Introduction to Syriac [Ibex Publishers, Bethesda, Maryland, 1999], and on Robert D. Hoberman, The Syntax and Semantics of Verb Morphology in Modern Aramaic [American Oriental Society, New Haven, Connecticut, 1989]. The alternative characterization of the branches of the Semitic languages as "north peripheral," "north central," "south central," and "south peripheral" derives from Winfred P. Lehmann, Historical Linguistics [Routledge, 1992, 1997, p.84]. There seem to be some question about whether Aramaic is more closely related to the Canaanite group or to Arabic, or equally distant from both. The development of Akkadian is described in David Marcus, A Manual of Akkadian [University Press of America, 1978] and that of Egyptian in Sir Alan Gardiner, Egyptian Grammar [Oxford University Press, 1927, 1964]. Sources and Related Links The Semitic and Other Afroasiatic Languages Glen Gordon on "Genetic Distance and Language Affinities Between Autochthonous Human Populations" Philosophy of Science Philosophy of History Kelley L. Ross, Ph.D. Ethnologue: Languages of the World, 15th. ed., Raymond G. Gordon, Jr.