Human Biology Investigating the origins of eastern Polynesians using genome-wide data from the Leeward Society Isles

Page 1 / 3 Next

Polynesians of Hawaii, New Zealand & Oceania

Last Post by Prau123 2 years ago

13 Posts

4 Users

6 Reactions

3,203 Views

RSS

Posts: 806

MeLeona

Topic starter

Jul 03, 2021 12:05 am

(@meleona)

Member

Joined: 7 years ago

Abstract

The debate concerning the origin of the Polynesian speaking peoples has been recently reinvigorated by genetic evidence for secondary migrations to western Polynesia from the New Guinea region during the 2nd millennium BP. Using genome-wide autosomal data from the Leeward Society Islands, the ancient cultural hub of eastern Polynesia, we find that the inhabitants’ genomes also demonstrate evidence of this episode of admixture, dating to 1,700–1,200 BP. This supports a late settlement chronology for eastern Polynesia, commencing ~1,000 BP, after the internal differentiation of Polynesian society. More than 70% of the autosomal ancestry of Leeward Society Islanders derives from Island Southeast Asia with the lowland populations of the Philippines as the single largest potential source. These long-distance migrants into Polynesia experienced additional admixture with northern Melanesians prior to the secondary migrations of the 2nd millennium BP. Moreover, the genetic diversity of mtDNA and Y chromosome lineages in the Leeward Society Islands is consistent with linguistic evidence for settlement of eastern Polynesia proceeding from the central northern Polynesian outliers in the Solomon Islands. These results stress the complex demographic history of the Leeward Society Islands and challenge phylogenetic models of cultural evolution predicated on eastern Polynesia being settled from Samoa.

Introduction

The cultural and linguistic unity of the islands and atolls of the central Pacific was first documented in detail by Johann Reinhold Forster, a naturalist on James Cook’s second voyage of discovery to the Pacific (1772–1775). He suggested that the similarity of the languages spoken there, now known as Polynesian, reflected a comparatively shallow time-depth since their dispersal¹. Forster’s seminal comparative study of Austronesian languages identified the lowland region of the Philippines in Island Southeast Asia (ISEA) as the ultimate source for the Polynesian languages and proposed a long-distance migration from there by the ancestors of today’s Polynesian speakers. This appeared to be the only explanation for the striking difference in phenotype that he observed between the peoples of the central Pacific and those of the intervening region, which is now known as Melanesia. Herein, the terms Melanesia and Micronesia are used in their geographical sense. We use the term Polynesia to include all islands and atolls whose inhabitants speak Polynesian languages, including 23 found throughout Melanesia and Micronesia, referred to as outlier Polynesia (Fig. 1a).

Separating the demographic histories of Polynesia and Melanesia became difficult to sustain with developments in archaeology during the second half of the 20th century. These established that the settlement of southern Melanesia (Santa Cruz, Vanuatu, New Caledonia and Fiji) and western Polynesia (Tonga, Samoa, Niue and Futuna) is marked by the same archaeological horizon, known as the Lapita Cultural Complex (LCC). The LCC first appears in northern Melanesia (the Bismarck Archipelago, Bougainville, and the Solomon Islands main chain) ~3,450–3,250 BP, and quickly spread into southern Melanesia ~3,200–3,000 BP, reaching Tonga and Samoa ~2,900 BP^{2,https://doi.org/10.1086/662201}

(2011)." href="https://www.nature.com/articles/s41598-018-20026-8#ref-CR3" data-track="click" data-track-action="reference anchor" data-track-label="link" data-test="citation-ref">3,https://doi.org/10.1371/journal.pone.0120795

(2015)." href="https://www.nature.com/articles/s41598-018-20026-8#ref-CR4" data-track="click" data-track-action="reference anchor" data-track-label="link" data-test="citation-ref" aria-label="Reference 4">4. At the same time, the study of comparative linguistics has shown that the Oceanic branch of the Austronesian phylum of languages, of which Polynesian is a member, is spoken throughout most of Melanesia and parts of coastal New Guinea, and appears to be a recent intrusion from ISEA⁵. So while there is considerable overlap between the distributions of the LCC and the Oceanic languages, there remains a phenotypic divide between southern Melanesia and western Polynesia, which is observed between Fiji and Tonga^{https://doi.org/10.1086/671195}

(2013)." href="https://www.nature.com/articles/s41598-018-20026-8#ref-CR6" data-track="click" data-track-action="reference anchor" data-track-label="link" data-test="citation-ref" aria-label="Reference 6">6,7.

A central theme in this debate is the extent to which the development of the LCC involved local people in the Bismarck Archipelago of northern Melanesia^{8,https://doi.org/10.1086/204604}

(1997)." href="https://www.nature.com/articles/s41598-018-20026-8#ref-CR9" data-track="click" data-track-action="reference anchor" data-track-label="link" data-test="citation-ref">9,10. An alternative is that the LCC represents the arrival of a largely pre-formed cultural package carried by speakers of proto-Oceanic languages from Taiwan, via the Philippines, in ISEA^{https://doi.org/10.1086/658181}

(2011)." href="https://www.nature.com/articles/s41598-018-20026-8#ref-CR11" data-track="click" data-track-action="reference anchor" data-track-label="link" data-test="citation-ref" aria-label="Reference 11">11. Hypotheses are placed on a continuum from a dendritic, radiating, phylogenetic model of cultural evolution that relies on the relative isolation of populations^{https://doi.org/10.1086/203547}

(1987)." href="https://www.nature.com/articles/s41598-018-20026-8#ref-CR12" data-track="click" data-track-action="reference anchor" data-track-label="link" data-test="citation-ref" aria-label="Reference 12">12, to one based on complex ongoing biological and cultural interaction between groups, leading to reticulated networks of genes and culture^{https://doi.org/10.1086/204604}

(1997)." href="https://www.nature.com/articles/s41598-018-20026-8#ref-CR9" data-track="click" data-track-action="reference anchor" data-track-label="link" data-test="citation-ref" aria-label="Reference 9">9. A compromise position has been promoted by the recognition of a Lapita homeland in the Bismarck Archipelago¹⁰, together with evidence that the genomes of contemporary Polynesians contain 20–30% ancestry typical of northern Melanesia and New Guinea^{https://doi.org/10.1016/j.ajhg.2007.09.010}

(2010)." href="https://www.nature.com/articles/s41598-018-20026-8#ref-CR14" data-track="click" data-track-action="reference anchor" data-track-label="link" data-test="citation-ref" aria-label="Reference 14">14. This posits a period of limited cultural and genetic admixture involving migrants from ISEA during the early LCC phase in northern Melanesia ~3,450–3,250 BP¹⁵. Polynesian society then developed in relative isolation following the pioneering settlement of Tonga and Samoa ~2,900 BP^{https://doi.org/10.1086/203547}

Genetic evidence for this intermediate model is provided by the presence of members of Y chromosome haplogroup (hg) C2a-M208, together with its daughter lineage C2a1-P33, among Polynesian speakers^{https://doi.org/10.1353/hub.2008.0004}

(2012)." href="https://www.nature.com/articles/s41598-018-20026-8#ref-CR17" data-track="click" data-track-action="reference anchor" data-track-label="link" data-test="citation-ref" aria-label="Reference 17">17. This is seen as a proxy for male-mediated admixture from northern Melanesian and New Guinean sources into the gene pool of migrants from ISEA during the formative period of the LCC in the Bismarck Archipelago, prior to the settlement of southern Melanesia and western Polynesia^{https://doi.org/10.1016/j.ajhg.2007.09.010}

(2008)." href="https://www.nature.com/articles/s41598-018-20026-8#ref-CR13" data-track="click" data-track-action="reference anchor" data-track-label="link" data-test="citation-ref" aria-label="Reference 13">13,18. In contrast, the near fixation in Polynesian speaking groups of the mitochondrial lineage B4a1a1 is seen as evidence of a predominantly ISEA maternal heritage^{https://doi.org/10.1016/j.ajhg.2007.09.010}

(2008)." href="https://www.nature.com/articles/s41598-018-20026-8#ref-CR13" data-track="click" data-track-action="reference anchor" data-track-label="link" data-test="citation-ref" aria-label="Reference 13">13,19. Subsequent research, however, has shown that B4a1a1 is widespread throughout northern Melanesia^{https://doi.org/10.1371/journal.pone.0000248}

(2007)." href="https://www.nature.com/articles/s41598-018-20026-8#ref-CR20" data-track="click" data-track-action="reference anchor" data-track-label="link" data-test="citation-ref" aria-label="Reference 20">20, including regions that show no evidence of autosomal admixture with people from ISEA^{https://doi.org/10.1371/journal.pgen.0040019}

(2008)." href="https://www.nature.com/articles/s41598-018-20026-8#ref-CR21" data-track="click" data-track-action="reference anchor" data-track-label="link" data-test="citation-ref" aria-label="Reference 21">21. Alternatively, therefore, hg B4a1a1 might also have been present in northern Melanesia before the emergence of the LCC^{https://doi.org/10.1016/j.ajhg.2011.01.009}

(2011)." href="https://www.nature.com/articles/s41598-018-20026-8#ref-CR22" data-track="click" data-track-action="reference anchor" data-track-label="link" data-test="citation-ref" aria-label="Reference 22">22,https://doi.org/10.1007/s00439-015-1620-z

(2016)." href="https://www.nature.com/articles/s41598-018-20026-8#ref-CR23" data-track="click" data-track-action="reference anchor" data-track-label="link" data-test="citation-ref" aria-label="Reference 23">23. Similar ambiguity now exists over the origins of paternal lineage C2a-M208, due to its presence in ISEA^{https://doi.org/10.1038/jhg.2012.154}

(2013)." href="https://www.nature.com/articles/s41598-018-20026-8#ref-CR24" data-track="click" data-track-action="reference anchor" data-track-label="link" data-test="citation-ref" aria-label="Reference 24">24 and rather low overall frequencies in the Bismarck Archipelago and coastal New Guinea^{https://doi.org/10.1093/molbev/msr186}

An important advance in this debate is the recovery of ancient genomic DNA from LCC contexts on Vanuatu (~2,900 BP) (n = 3) and post-Lapita Tonga (~2,500 BP) (n = 1), since the results indicate people with close to 100% ancestry related to an ISEA heritage^{https://doi.org/10.1038/nature19844}

(2016)." href="https://www.nature.com/articles/s41598-018-20026-8#ref-CR25" data-track="click" data-track-action="reference anchor" data-track-label="link" data-test="citation-ref" aria-label="Reference 25">25. These data show that some settlers of the LCC period appear to have transited northern Melanesia and New Guinea from ISEA without receiving any significant amounts of genetic admixture. A second major finding is that the 20–30% ancestry originating from northern Melanesia and New Guinea, detected in contemporary genomes from the eastern fringe of southern Melanesia and western Polynesia, appears to have arrived during the 2^nd millennium BP (1,900–1,200 BP). This result is consistent with post-LCC movements of people into southern Melanesia and western Polynesia, in a process of polygenesis, being responsible for the differences in phenotype observed between the two regions^{https://doi.org/10.1086/671195}

The potential significance of this proposed post-LCC migration for the phylogenetic approach to cultural evolution cannot be overstated. This is because the model is based on an Ancestral Polynesian Society (APS) developing in a western Polynesian homeland during the mid 3^rd millennium BP, followed by a rapid settlement of eastern Polynesia ~2,200 BP^{https://doi.org/10.1086/203547}

(1987)." href="https://www.nature.com/articles/s41598-018-20026-8#ref-CR12" data-track="click" data-track-action="reference anchor" data-track-label="link" data-test="citation-ref" aria-label="Reference 12">12. The settlement of eastern Polynesia, however, has witnessed significant reductions in the earliest secure radiometric dates in recent years. These currently stand at ~950 BP and come from Rai’atea in the Leeward Society Isles^{https://doi.org/10.1073/pnas.1100447108}

(2011)." href="https://www.nature.com/articles/s41598-018-20026-8#ref-CR26" data-track="click" data-track-action="reference anchor" data-track-label="link" data-test="citation-ref" aria-label="Reference 26">26,https://doi.org/10.1073/pnas.1015876108

(2011)." href="https://www.nature.com/articles/s41598-018-20026-8#ref-CR27" data-track="click" data-track-action="reference anchor" data-track-label="link" data-test="citation-ref" aria-label="Reference 27">27, thereby excluding the original calibration for the model and subsequent revisions to it²⁸. The archaeology for the phylogenetic model can also be challenged because the evidence post 2,500 BP suggests isolation of Tonga and Samoa, rather than the interaction invoked for the development of Proto-Polynesian language²⁹. By ~950 BP, society in western Polynesia was differentiated, both culturally and linguistically, indicating that, if this late chronology is accurate, the source population for eastern Polynesia was likely a regional group rather than the hypothetical APS^29,30.

A central component of the original phylogenetic model is the long-standing sub-grouping of the Polynesian languages. The initial divergence of Nuclear Polynesian from the Tongic languages is followed by a second-order split, between Proto-East Polynesian (Rapa Nui, Marquesan and Tahitic) and the rest of the Nuclear Polynesian languages (Samoic and all the Polynesian outlier languages)³¹ (Fig. 1b, left-hand tree). This sub-grouping recognizes the separation of Tongic and Samoic but is difficult to reconcile with a settlement of eastern Polynesia commencing ~950 BP, since it necessitates the second-order split, involving Proto-East Polynesian, to occur up to ~1,200 years earlier. An alternative linguistic sub-grouping that places the East Polynesian languages together with those of the central northern outliers (east coast of the northern Solomon Islands) provides a potential solution for the apparent discordance between archaeology and language^32,33 (Fig. 1b, right-hand tree). This also challenges the orthodoxy within Polynesian studies that eastern Polynesia was settled directly from Samoa^{https://doi.org/10.1086/658181}

(1987)." href="https://www.nature.com/articles/s41598-018-20026-8#ref-CR12" data-track="click" data-track-action="reference anchor" data-track-label="link" data-test="citation-ref" aria-label="Reference 12">12,28. For Kirch and Green²⁸, Samoa is ancient Hawa’iki, the cradle of Polynesian culture. In contrast, for Wilson³² Hawa’iki represents the ancient name for the Leeward Society Isles, which are referred to as the cultural and spiritual hub of eastern Polynesia in oral histories of the region, from where other islands and atolls were settled³⁴.

The Leeward Society Isles, therefore, are of central importance to understanding the reasons for these conflicting signals from archaeology and language. If the ancestors of the Leeward Society Islanders experienced the same episode of ancient admixture as people in western Polynesia and outlier Polynesia during the mid 2^nd millennium BP^{https://doi.org/10.1038/nature19844}

(2016)." href="https://www.nature.com/articles/s41598-018-20026-8#ref-CR25" data-track="click" data-track-action="reference anchor" data-track-label="link" data-test="citation-ref" aria-label="Reference 25">25, this would support the late settlement chronology. In this study, we report the first genomic data from Bora Bora, Rai’atea and Taha’a, three of the Leeward Society Isles. We use the analysis of genotype and haplotype data to ascertain whether the signals of admixture present in these eastern Polynesian populations are similar to those from western and outlier Polynesia and identify potential donors to the ancestors of the Leeward Society Islanders. Further insights into the demographic history of eastern Polynesia is provided by the first deep re-sequencing of Polynesian Y chromosomes, complemented by high-resolution genotyping of key paternal and maternal lineages from the Leeward Society Isles and New Zealand.

Results

Data

Here we present a new genomic dataset sampled from the Leeward Society Islands, eastern Polynesia. We report high-resolution autosomal genotyping data from 30 individuals, complemented by genotyping and/or re-sequencing of uniparental loci (mtDNA and Y) from 81 individuals, including seven Y chromosomes re-sequenced by a target-capture method. In addition, we present new uniparental data from 49 Maori individuals sampled in New Zealand (Supplementary Tables S1–S3). The dataset is analyzed together with publicly available data from Island Southeast Asia, Melanesia and Polynesia (Supplementary Tables S1, S4 and S5). For detailed information about samples used in the present study, please refer to the Materials and Methods section.

Autosomal analysis

The first two PCs of the principal components analysis (PCA, Fig. 1c) account for 38% of the variation in the studied dataset. The close overlap between eastern Polynesians and Samoans on the PC1 axis suggests similar amounts of genetic ancestry shared with New Guinea and northern Melanesia. The model-based analysis of autosomal SNPs using ADMIXTURE³⁵ shows that, at K = 4, 70–80% of the Leeward Society Islander genomes can be characterized by the component typical of ISEA/East Asia (Fig. 2a); the remaining 20–30% of their genetic ancestry is best represented by Papuan speakers from New Guinea (light purple). From K = 5, Polynesians take their own ancestry (green), which, like their deflection on the PCA plot, is most likely due to genetic drift or, alternatively, cryptic relatedness or extreme inbreeding in studied populations. However, the latter is unlikely due to the lack of close relatives (up to third-degree, inclusive) in four Polynesian groups, and normal range of inbreeding coefficients when comparing to other human populations (F_IS, Supplementary Table S6).

The lowest cross-validation (CV) score of ADMIXTURE is observed at K = 11, but no additional ancestries appear in Polynesians after K = 10, which has the second lowest CV score (Fig. 2a, Supplementary Figs S1 and S2). At K = 10, a dark blue component appears that is almost fixed in the Kankanaey of northwestern Luzon. The distinctive and uniform profiles of additional ISEA, Melanesian, and East Asian ancestries in two (Tonga and Samoa) out of four, otherwise very closely related, Polynesian groups hint that these may be the result of an old admixture process, rather than genetic drift, extreme bottlenecks or algorithmic artifacts. In contrast, the noticeably uneven distribution of the East Asian (yellow) and western European (grey) ancestry components within the profiles of the Leeward Society individuals (Fig. 2b) is consistent with recent historical admixture events (see haplotype-based admixture analysis below).

The outgroup f3^{https://doi.org/10.1534/genetics.112.145037}

(2012)." href="https://www.nature.com/articles/s41598-018-20026-8#ref-CR36" data-track="click" data-track-action="reference anchor" data-track-label="link" data-test="citation-ref" aria-label="Reference 36">36 allele-sharing plot shows the length of a phylogenetic branch shared between two study populations and African Yoruba. For the Leeward Society Isles (Supplementary Fig. S3, Supplementary Table S7), the f3 allele-sharing results are consistent with a most recent evolutionary history shared with Samoa, Tahiti, and Tonga. It also suggests that the Kankanaey of the Philippines and Taiwanese aborigines are the next closest populations to all four Polynesian groups. These results remain robust to the different SNP subsets or population clustering schemes used in the present study (Supplementary Figs S3, S4, Supplementary Table S7). In contrast, the f3 admixture plots (Supplementary Fig. S5, Supplementary Table S7), which detect the presence of admixture in a study population from two reference groups, display different results for western and eastern Polynesia. These differences could be explained by a reduced effective population size for eastern Polynesians, caused by bottlenecks during the initial settlement process, or because Tonga and Samoa have experienced additional admixture since they last shared a common ancestral gene pool with Tahiti and the Leeward Society Isles.

The unsupervised fineSTRUCTURE (FS) analysis of haplotypes^{https://doi.org/10.1371/journal.pgen.1002453}

(2012)." href="https://www.nature.com/articles/s41598-018-20026-8#ref-CR37" data-track="click" data-track-action="reference anchor" data-track-label="link" data-test="citation-ref" aria-label="Reference 37">37 placed individuals into genetic clusters that include: Philippine groups from lowland Luzon, Palawan, and Visayas (‘Philippines 1’), Malaysia, Sulawesi, East Asia, northern Melanesia (Bougainville), New Guinea, and western Europe (Supplementary Fig. S6). The GLOBETROTTER (GT) analysis^{https://doi.org/10.1126/science.1243518}

(2014)." href="https://www.nature.com/articles/s41598-018-20026-8#ref-CR38" data-track="click" data-track-action="reference anchor" data-track-label="link" data-test="citation-ref" aria-label="Reference 38">38 produced strong statistical support for two separate episodes of admixture involving the ancestors of the Leeward Society Islanders (Fig. 3, Supplementary Table S8). The first represents an average contribution of ~6% western European ancestry, which is dated to 1749–1803 CE. This is consistent with documented contact during Cook’s three voyages of exploration¹, which took place 1768-71, 1772-75 and 1776-80. The second episode is estimated to have occurred in an interval from ~1,200 to 1,700 BP (229–725 CE), and is composed of a minor component (~17%), comprising mainly northern Melanesian and New Guinea sources, and a major one (~83%), in which the largest contributions are attributable to the ‘Philippines 1’, Sulawesi, and Malaysian clusters. The chronology indicates that this episode occurred prior to the earliest widely accepted radiometric dates for the permanent settlement of eastern Polynesia, which centre on ~950 BP and come from archaeological sites on Rai’atea in the Leeward Society Isles^{https://doi.org/10.1073/pnas.1100447108}

(2011)." href="https://www.nature.com/articles/s41598-018-20026-8#ref-CR27" data-track="click" data-track-action="reference anchor" data-track-label="link" data-test="citation-ref" aria-label="Reference 27">27. In addition, the presence of northern Melanesian ancestry in the minor component of the second (older) episode of admixture (~8% of the genome) reflects some genetic contact with this region for the ancestors of the Leeward Society Islanders prior to 1,200–1,700 BP.

In order to investigate the presence of the northern Melanesian contributions further, we performed a GT analysis on different subsets of Leeward Society Islanders as recipient groups (Supplementary Fig. S7, Supplementary Table S8). The results produced a spread of dates for the older episode of admixture, but always partitioned the northern Melanesian contribution into both sides of the admixture episode with point estimates ranging between ca 1,200 and 1,850 BP. Some of the variability in the dating may be due to the heterogeneous distribution of what appears to be recent admixture with people of East Asian ancestry (Figs 2B, S7, Supplementary Table S8), but the result is robust to variations in the makeup of the recipient group. This result, therefore, provides important evidence for either a period of migration from northern Melanesia into the ancestors of eastern Polynesians during the 3^rd millennium BP, or a process of biological admixture taking place during the LCC period in northern Melanesia prior to the pioneering settlement of Polynesia ~3,000 BP.

A further insight from the FS and GT analyses of haplotypes is the clear delineation between possible donor groups within the Philippine palette of populations. This excludes the Aeta, Batak, and Kankanaey clusters from any significant contribution to the population ancestral to the Leeward Society Isles (Figs 3, S7, Supplementary Table S8). The Philippine populations from the regions of Luzon, Palawan, and Visayas form a ‘Philippines 1’ cluster, which contributes nearly 40% of the genome of the Leeward Society Islanders. The apparent discrepancy with the analysis of unlinked SNPs (Supplementary Figs S3 and S4), which indicates the Kankanaey as being closest to the Leeward Society Isles, may be caused by the two methods measuring different aspects of the underlying genetic structure. In addition, ascertainment bias inherent to genotyping arrays data can affect the allele-sharing statistics. The GT analysis, in contrast, is based on combinations of linked markers, and is consequently more powerful and robust for identification of complex admixture events^{https://doi.org/10.1126/science.1243518}

Uniparental haploid loci: mtDNA

Ninety-six percent of Leeward Society Isles mitochondrial lineages belong to the haplogroup (hg) B4a1a1 typical of Polynesian speaking populations. A PCA plot based on frequencies of mtDNA B4a1a lineages (Supplementary Fig. S8) places the Leeward Society Islands closest to Ontong Java (central northern Polynesian outlier, Fig. 1a) with the major western Polynesian populations of Tonga and Samoa among the most distant from eastern Polynesians. The Bayesian estimate of the time to the most recent common ancestor (MRCA) for well-supported clades of mitochondrial hg B4a1a1 (Supplementary Table S10A) is consistent with more than a third being significantly older than the first settlement of southern Melanesia and western Polynesia. The diversity-based age for B4a1a1 among Polynesian-speaking groups at ~5,700 BP (4,100–7,700 BP) is substantially older than the age of the LCC in northern Melanesia.

Uniparental haploid loci: Y chromosome

The major Y chromosome haplogroup in the Leeward Society Isles is C2a1-P33 (67%), a sub-clade of C2a-M208, as it is throughout eastern Polynesia^{https://doi.org/10.1353/hub.2008.0004}

(2007)." href="https://www.nature.com/articles/s41598-018-20026-8#ref-CR16" data-track="click" data-track-action="reference anchor" data-track-label="link" data-test="citation-ref" aria-label="Reference 16">16, including the New Zealand Maori (52%), and the central northern outlier of Ontong Java (Supplementary Table S9B). Many of the haplotypes from eastern Polynesia (Leeward Society Isles, Tahiti, New Zealand Maori), and Ontong Java, are found near to the root of the hg C2a-M208 phylogenetic network (Supplementary Fig. S9). The PCA of this haplogroup and its sub-clades, including C2a1-P33, places the Leeward Society Islanders in closest overall proximity to individuals from the central northern Polynesian outlier of Ontong Java rather than those from Tonga and Samoa (Supplementary Fig. S10). The MRCA of the four target-sequenced Society Isles Y chromosomes provides an age of ~2,100 BP for the hg C2a1-P33 (Supplementary Fig. S11, Supplementary Table S10B).

Another Y chromosome hg, O3a’i-P164, represents a possible ISEA contribution to male lineages among Polynesians and occurs in western Polynesia at significant levels (35%). In the Leeward Society Islands O3a’i-P164 has a frequency of 11% (Supplementary Table S9B). Seven of the eight individuals belong to the O3i-B451 clade, which so far has only been identified among Austronesian speakers in ISEA^{https://doi.org/10.1101/gr.186684.114}

(2017)." href="https://www.nature.com/articles/s41598-018-20026-8#ref-CR40" data-track="click" data-track-action="reference anchor" data-track-label="link" data-test="citation-ref" aria-label="Reference 40">40. All of these seven also typed positive for the downstream B450 marker and share an MRCA at ~5,700 BP with a Sama-Bajaw individual from Sulawesi (Supplementary Fig. S11, Supplementary Table S3). They also carry a rare triplication event at DYS385, which is present among individuals, not genotyped beyond the ancestral positions of O3′7-M122 and O3′6-M324, from New Zealand (Supplementary Table S3), western Polynesia, Tikopia (southern Polynesian outlier) and Fiji (southern Melanesia)^{https://doi.org/10.1093/molbev/msr186}

(2012)." href="https://www.nature.com/articles/s41598-018-20026-8#ref-CR41" data-track="click" data-track-action="reference anchor" data-track-label="link" data-test="citation-ref" aria-label="Reference 41">41. These individuals, therefore, likely belong to hg O3i.

The Y chromosome diversity of the Leeward Society Isles is completed by hg O1-M119 and hg O6a-JST002611, which are prevalent in Taiwanese aborigines and East Asia^{https://doi.org/10.1186/1471-2156-15-77}

(2014)." href="https://www.nature.com/articles/s41598-018-20026-8#ref-CR42" data-track="click" data-track-action="reference anchor" data-track-label="link" data-test="citation-ref" aria-label="Reference 42">42, respectively, and hg S2a-P79 (formerly K3-P79) (see Supplementary Fig. S11 and its legend online, Supplementary Table S9B). The latter occurs on average at a frequency 6% in eastern Polynesia, western Polynesia, and Ontong Java (central northern Outlier). The available high-resolution STR data place the S2a-P79 haplotypes of the Leeward Society Isles in close proximity to those from New Zealand Maori and Ontong Java (Supplementary Fig. S12).

Discussion

The genomes of contemporary Polynesian-speaking groups appear to be a mosaic of components derived from the coming together of long-diverged sources from ISEA and the region of northern Melanesia/New Guinea^{https://doi.org/10.1016/j.ajhg.2007.09.010}

(2016)." href="https://www.nature.com/articles/s41598-018-20026-8#ref-CR25" data-track="click" data-track-action="reference anchor" data-track-label="link" data-test="citation-ref" aria-label="Reference 25">25. How this came about is the subject of considerable debate^{https://doi.org/10.1086/204604}

(1987)." href="https://www.nature.com/articles/s41598-018-20026-8#ref-CR12" data-track="click" data-track-action="reference anchor" data-track-label="link" data-test="citation-ref" aria-label="Reference 12">12,30. Our haplotype-based analysis of high-density autosomal SNPs indicates that, for the ancestors of the Leeward Society Isles, most of this admixture occurred during a period spanning ~1,200–1,700 BP. These genetic dates are nearly identical to those of a previous analysis that used a different method and amalgamated haplotype data from western (Tonga) and outlier (Rennell, Bellona and Tikopia) Polynesia^{https://doi.org/10.1038/nature19844}

(2016)." href="https://www.nature.com/articles/s41598-018-20026-8#ref-CR25" data-track="click" data-track-action="reference anchor" data-track-label="link" data-test="citation-ref" aria-label="Reference 25">25. They contrast with older dates obtained using different data sets and methods, which vary from ~7,000 BP to ~2,700 BP^{https://doi.org/10.1016/j.ajhg.2007.09.010}

(2011)." href="https://www.nature.com/articles/s41598-018-20026-8#ref-CR43" data-track="click" data-track-action="reference anchor" data-track-label="link" data-test="citation-ref" aria-label="Reference 43">43. The method used here has been demonstrated to accurately identify known historical admixture events during the past 2,000 years^{https://doi.org/10.1126/science.1243518}

(2017)." href="https://www.nature.com/articles/s41598-018-20026-8#ref-CR44" data-track="click" data-track-action="reference anchor" data-track-label="link" data-test="citation-ref" aria-label="Reference 44">44, but it is also possible that other analytical approaches may provide insights into a different part of the genealogical process.

The presence of this demographic signal in the data from the Leeward Society Isles is important, since it is consistent with archaeological evidence for a late settlement model for eastern Polynesia ~950 BP, and, therefore, the linguistic sub-grouping of Wilson³² (Fig. 1b). The substantial body of linguistic evidence supporting this sub-grouping includes over 200 lexical and grammatical innovations that are shared between the languages of eastern Polynesia and the central northern outliers (Luanguia, spoken on Ontong Java, Takuu, Nukumanu and Nuguria). Moreover, these innovations are stepwise and directional in nature, a pattern that is only consistent with a west-to-east movement of people, tracing the origins of eastern Polynesians to central northern outlier Polynesia, rather than Samoa^32,33. The principal component analysis and phylogenetic reconstruction of the Polynesian mtDNA B4a1a1 sub-groups and C2a1-P33 paternal lineages (Supplementary Figs S8–S10, S12), are consistent with this linguistic evidence for the recent settlement of eastern Polynesia from the central northern outliers.

A further important contribution to the debate on Polynesian origins is the partitioning of northern Melanesian ancestry into both sides of the admixture episode taking place ~1,200–1,700 BP in the ancestors of the Leeward Society Islanders (Fig. 3). In particular, the contribution of ~8% of this ancestry to the side containing the ISEA sources is significant, because it suggests an earlier episode of admixture affecting the population ancestral to the Leeward Society Islanders. This result is robust to analysis by subsets of the data (Supplementary Fig. S7), but it is not possible to determine how and when this northern Melanesian ancestry entered into the ancestral gene-pool of the Leeward Society Islanders. It, therefore, remains feasible that, for some groups of Austronesian speaking migrants from ISEA, genetic admixture accompanied cultural interaction during the formative period of the LCC in the Bismarck archipelago ~3,450–3,250 BP^8,15, which precedes the settlement of southern Melanesia and western Polynesia by at least 200 years^{https://doi.org/10.1086/662201}

The position of the Kankanaey as the closest group to the Leeward Society Islanders in the outgroup f3 allele-sharing plots (Supplementary Figs 3 and 4), while not making any significant contribution to their genomes in the GLOBETROTTER^{https://doi.org/10.1126/science.1243518}

(2014)." href="https://www.nature.com/articles/s41598-018-20026-8#ref-CR38" data-track="click" data-track-action="reference anchor" data-track-label="link" data-test="citation-ref" aria-label="Reference 38">38 (GT) results (Fig. 3) is potentially very revealing. It is arguable that one or other result is misleading as an effect of severe genetic drift. However, this hypothesis requires the concurrent excess retention of either SNPs (should f3 results be taken at face value), or haplotypes (should we trust only GT), typical of those found in the Leeward Society Islands today, which is statistically unlikely. Alternatively, while the Kankanaey are indeed the single best remaining proxy for the ancestors of the Leeward Society Islanders, the ‘Philippine 1’ cluster is admixed with a genetically closer population for those ancestors (comparing to the Kankanaey). Specifically, although the ‘Philippine 1’ cluster has received extensive admixture with other groups, which lowers their f3 score, they retain the best proxy for the haplotypic variation found in the original ancestors of the Leeward Society Islanders. This hypothesis is preferred because the GT approach models the recipient population using donors who are reconstructed rather than observed, allowing for subsequent admixture in the donor groups^{https://doi.org/10.1126/science.1243518}

Within the geographical context of the Philippines, the GT results make sense because the populations making up the other three Philippines clusters are all located in mountainous regions and have languages that are either relics or indicate long-term isolation^{https://doi.org/10.3378/027.085.0316}

(2013)." href="https://www.nature.com/articles/s41598-018-20026-8#ref-CR46" data-track="click" data-track-action="reference anchor" data-track-label="link" data-test="citation-ref" aria-label="Reference 46">46,https://doi.org/10.2307/3622752

(1974)." href="https://www.nature.com/articles/s41598-018-20026-8#ref-CR47" data-track="click" data-track-action="reference anchor" data-track-label="link" data-test="citation-ref" aria-label="Reference 47">47. In contrast, the ancestors of the demographic expansion that led to the settlement of Polynesia are anticipated to be part of a recent seafaring tradition. This necessarily would have been based in the coastal regions and could be related to pre-existing trading networks within ISEA that already had links to Melanesia (see Donohue and Denham^{https://doi.org/10.1086/650991}

(2010)." href="https://www.nature.com/articles/s41598-018-20026-8#ref-CR48" data-track="click" data-track-action="reference anchor" data-track-label="link" data-test="citation-ref" aria-label="Reference 48">48 and comments for a discussion of this subject). In this respect, it is interesting to note that the age of the most recent common ancestor of the Y chromosome haplogroup O3i-B451 (5,900–8,100 BP, Supplementary Table S10B), proposed as a marker for the expansion of Austronesian speaking people throughout ISEA^{https://doi.org/10.1371/journal.pone.0175080}

(2017)." href="https://www.nature.com/articles/s41598-018-20026-8#ref-CR40" data-track="click" data-track-action="reference anchor" data-track-label="link" data-test="citation-ref" aria-label="Reference 40">40, exceeds the proposed timing for the transfer of the Neolithic from Taiwan (4,200 BP)^{https://doi.org/10.1086/658181}

Within the Society Islands themselves, maternally-inherited mitochondrial DNA lineages are strongly biased towards variants thought to be associated with the dispersal of Austronesian speakers (96% B4a1a1, Supplementary Table S9A). The best candidate for a contribution from the Austronesian speaking diaspora of ISEA to the male lineages of the Leeward Society Islands is haplogroup O3i-B451. However, it contributes less than 10% to the Leeward Society Islands paternal lineages (Supplementary Table S3). The majority of Y chromosome lineages have proposed ancestral associations with modern Papuan groups (C2a1-P33 and S2a-P79, Supplementary Table S9B)^{https://doi.org/10.1016/j.ajhg.2007.09.010}

(2012)." href="https://www.nature.com/articles/s41598-018-20026-8#ref-CR17" data-track="click" data-track-action="reference anchor" data-track-label="link" data-test="citation-ref" aria-label="Reference 17">17. This sex bias holds across Polynesia and is observed as far back as Island Southeast Asia^{https://doi.org/10.1098/rspb.2009.2041}

(2010)." href="https://www.nature.com/articles/s41598-018-20026-8#ref-CR49" data-track="click" data-track-action="reference anchor" data-track-label="link" data-test="citation-ref" aria-label="Reference 49">49, and may have resulted from the practice of exogamy and matrilocal post-marital residence among early Austronesian speaking groups^{https://doi.org/10.1016/j.jaa.2011.06.004}

(2011)." href="https://www.nature.com/articles/s41598-018-20026-8#ref-CR50" data-track="click" data-track-action="reference anchor" data-track-label="link" data-test="citation-ref" aria-label="Reference 50">50. A sex bias is also reflected in the nuclear genomes of Austronesian speakers and appears to be a characteristic of the Pacific region as a whole^{https://doi.org/10.1038/nature19844}

(2016)." href="https://www.nature.com/articles/s41598-018-20026-8#ref-CR51" data-track="click" data-track-action="reference anchor" data-track-label="link" data-test="citation-ref" aria-label="Reference 51">51.

In conclusion, the picture of Polynesian origins emerging from the present study is one of a more complex demographic history than that originally envisioned in the phylogenetic model of cultural evolution^{https://doi.org/10.1086/203547}

(1987)." href="https://www.nature.com/articles/s41598-018-20026-8#ref-CR12" data-track="click" data-track-action="reference anchor" data-track-label="link" data-test="citation-ref" aria-label="Reference 12">12. The results presented here provide support for models based on inter-connectivity among, and within, the different parts of the Pacific, rather than their relative isolation^{8,https://doi.org/10.1086/204604}

(1997)." href="https://www.nature.com/articles/s41598-018-20026-8#ref-CR9" data-track="click" data-track-action="reference anchor" data-track-label="link" data-test="citation-ref" aria-label="Reference 9">9. The new data concur with a late chronology for the settlement of eastern Polynesia, which fits better with the linguistic arguments and haploid data linking this region to the northern central Polynesian outliers. With respect to the ultimate origin of the Island Southeast Asian ancestry found in the Leeward Society Isles, the results indicate a significant role for the lowland region of the Philippines, as predicted by Johann Reinhold Forster in his seminal comparative study of languages conducted more than two hundred and forty years ago.

Materials and Methods

New samples

Thirty-six samples from the Leeward Society Islands were previously reported for Y chromosome genotypes^{https://doi.org/10.1016/j.gene.2014.03.005}

(2014)." href="https://www.nature.com/articles/s41598-018-20026-8#ref-CR52" data-track="click" data-track-action="reference anchor" data-track-label="link" data-test="citation-ref" aria-label="Reference 52">52and 44 new samples are reported here, making a total of 81 male individuals from three islands: Bora Bora (n = 14), Rai’atea (n = 36), Taha’a (n = 31). In addition, 49 male Maori individuals sampled in New Zealand are reported for Y chromosome genotypes (Supplementary Table S1A). All samples were collected with informed consent and with the approval of the institutional review boards at the University of Colorado, U.S.A., and La Trobe University, Melbourne, Australia. All experiments were performed in accordance with the relevant guidelines and regulations of the collaborating institutions.

mtDNA analysis

The DNA extracts of 81 Leeward Society Islanders (Supplementary Table S1A) were genotyped for membership of mitochondrial haplogroups typical of eastern Polynesia^{https://doi.org/10.1371/journal.pone.0035026}

(2014)." href="https://www.nature.com/articles/s41598-018-20026-8#ref-CR54" data-track="click" data-track-action="reference anchor" data-track-label="link" data-test="citation-ref">54,https://doi.org/10.1371/journal.pone.0047881

(2012)." href="https://www.nature.com/articles/s41598-018-20026-8#ref-CR55" data-track="click" data-track-action="reference anchor" data-track-label="link" data-test="citation-ref" aria-label="Reference 55">55. Nomenclature followed that of Phylotree.org, mtDNA tree Build 17^{https://doi.org/10.1002/humu.20921}

(2009)." href="https://www.nature.com/articles/s41598-018-20026-8#ref-CR56" data-track="click" data-track-action="reference anchor" data-track-label="link" data-test="citation-ref" aria-label="Reference 56">56(Supplementary Table S2). This resulted in 78 individuals allocated to the hg B4a1a1 and three individuals to hg Q. The control region (nps 57–372 and nps 16024–16526) was sequenced for 36 samples that could not be allocated to the known sub-clades of these two haplogroups, and 25 samples were further selected for complete mitochondrial genome sequencing. Using information from the full sequences, additional nucleotides were typed by Sanger sequencing to complete the haplogroup assignment.

The 25 newly generated complete mtDNA sequences were merged with published data (see Supplementary Table S4), and a Bayesian phylogenetic approach in BEAST 1.8.4^{https://doi.org/10.1093/molbev/mss075}

(2012)." href="https://www.nature.com/articles/s41598-018-20026-8#ref-CR57" data-track="click" data-track-action="reference anchor" data-track-label="link" data-test="citation-ref" aria-label="Reference 57">57used to analyse two data sets comprising genomes belonging to hgs B4a1a1 (n = 442) and M29/Q (n = 111), respectively (Supplementary Table S4). The data sets were partitioned into the D-loop, rRNA genes, and first, second, and third, codon positions of the 13 protein-coding genes. Each data subset was assigned an independent model of nucleotide substitution, selected using the Bayesian information criterion in PartitionFinder^{https://doi.org/10.1093/molbev/mss020}

(2012)." href="https://www.nature.com/articles/s41598-018-20026-8#ref-CR58" data-track="click" data-track-action="reference anchor" data-track-label="link" data-test="citation-ref" aria-label="Reference 58">58. Four demographic models for the tree prior were compared: constant size, exponential growth, logistic growth, and Bayesian skyride coalescent^{https://doi.org/10.1093/molbev/msn090}

(2008)." href="https://www.nature.com/articles/s41598-018-20026-8#ref-CR59" data-track="click" data-track-action="reference anchor" data-track-label="link" data-test="citation-ref" aria-label="Reference 59">59, together with two models of rate variation across lineages: strict clock and uncorrelated lognormal relaxed clock^{https://doi.org/10.1371/journal.pbio.0040088}

(2006)." href="https://www.nature.com/articles/s41598-018-20026-8#ref-CR60" data-track="click" data-track-action="reference anchor" data-track-label="link" data-test="citation-ref" aria-label="Reference 60">60. Marginal likelihoods were calculated using path sampling with 25 power posteriors, with samples drawn every 2 M MCMC steps across a total of 50 M steps. For the B4a1a1 analysis, the best combination was a strict clock with a logistic growth coalescent model. For the M29-Q analysis, the combination of strict clock and Skyride model is reported because the demographic model showed a clear change in population size.

To calibrate the estimate of the timescale, a normal prior for the mutation rate was specified (mean 2.14 × 10⁻⁸mutations/site/year, standard deviation 2.87 × 10⁻⁹)^{https://doi.org/10.1093/molbev/msu222}

(2014)." href="https://www.nature.com/articles/s41598-018-20026-8#ref-CR61" data-track="click" data-track-action="reference anchor" data-track-label="link" data-test="citation-ref" aria-label="Reference 61">61. The posterior distributions of parameters, including the genealogy, were estimated using MCMC sampling. Samples were drawn every 5,000 steps over a total of 50 M MCMC steps. To check for convergence, each analysis was run in duplicate. After checking for acceptable MCMC mixing, the samples from the two runs were combined. Sufficient sampling was checked by computing the effective sample sizes of all parameters, which were found to be greater than 200.

To make a principal component analysis of the haplogroup B4a1a data, 442 complete Polynesian and Melanesian mtDNA genomes used for the BEAST analysis (Supplementary Table S4) were combined with the additional complete or partially complete mtDNA sequences from Melanesia^{https://doi.org/10.1016/j.ajhg.2014.03.014}

(2012)." href="https://www.nature.com/articles/s41598-018-20026-8#ref-CR55" data-track="click" data-track-action="reference anchor" data-track-label="link" data-test="citation-ref" aria-label="Reference 55">55(n = 159), New Zealand Maori^{https://doi.org/10.1371/journal.pone.0035026}

(2012)." href="https://www.nature.com/articles/s41598-018-20026-8#ref-CR53" data-track="click" data-track-action="reference anchor" data-track-label="link" data-test="citation-ref" aria-label="Reference 53">53(n = 20), and the remaining hg B4a1a1 haplotypes from Leeward Society Islands (n = 55). Haplotypes were assigned to sub-clades of the hg B4a1a by the HaploGrep2 software^{https://doi.org/10.1093/nar/gkw233}

(2016)." href="https://www.nature.com/articles/s41598-018-20026-8#ref-CR62" data-track="click" data-track-action="reference anchor" data-track-label="link" data-test="citation-ref" aria-label="Reference 62">62and manual inspection of sequences. The resulting haplogroup frequencies were used to produce a population level PCA in R⁶³(Supplementary Fig. S8).

Y chromosome analysis

Eighty-one male individuals from the Leeward Society Islands were genotyped for Y chromosome haplogroup specific SNPs by Sanger sequencing in a hierarchical manner, including new branch-defining SNPs from sub-clades O3i-B451 and O3i-B450. Unless otherwise stated, all nomenclature follows that of Karmin,et al.^{https://doi.org/10.1101/gr.186684.114}

(2015)." href="https://www.nature.com/articles/s41598-018-20026-8#ref-CR39" data-track="click" data-track-action="reference anchor" data-track-label="link" data-test="citation-ref" aria-label="Reference 39">39to avoid potential confusion. The Y chromosomes of nine individuals belonged to haplogroups typical of Europeans (G, J, and R) and were not subject to further analysis. In addition, the Y chromosomes of 49 Maori males sampled in New Zealand were genotyped (Supplementary Table S1A). These samples were also hierarchically tested to a level of phylogenetic resolution equivalent to the main haplogroup level in the Leeward Society Islanders (Supplementary Table S3).

The 72 Leeward Society Isles samples with non-European Y chromosomes, together with the 49 Maori samples, were further genotyped for 23 Y chromosome short tandem repeats (Y-STRs; Supplementary Table S3). After merging with comparative data from other sources^{https://doi.org/10.1353/hub.2008.0004}

(2014)." href="https://www.nature.com/articles/s41598-018-20026-8#ref-CR42" data-track="click" data-track-action="reference anchor" data-track-label="link" data-test="citation-ref" aria-label="Reference 42">42and excluding individuals with partial results, this produced a data set of 15 microsatellites and these were used it to construct phylogenetic networks of hgs C2a-M208 and K*-M9 using the reduced median algorithm with the software Network 4.6.1.1 software⁶⁴(Fluxus-Engineering). The same 15 microsatellites occurring on the C2a-M208 background were used to perform PCA in R⁶³, in order to examine the relationship of eastern, western and outlier Polynesia populations for this key haplogroup within Polynesian Y chromosome diversity (Supplementary Fig. S10).

Next, seven individuals belonging to hgs C2a1-P33 (n = 4), K-M9 (n = 1), O3i-B450 (n = 1) and O6a-JST002611 (n = 1) were selected for target-capture re-sequencing using the BigY service (Gene By Gene Ltd) (Supplementary Table S1A). The paired-end reads were mapped to the GRCh37 human reference sequence. The reconstruction and rooting of the phylogeny of the seven samples from the Leeward Society Islands used 56 sequences published in Karmin,et al.^{https://doi.org/10.1101/gr.186684.114}

(2015)." href="https://www.nature.com/articles/s41598-018-20026-8#ref-CR39" data-track="click" data-track-action="reference anchor" data-track-label="link" data-test="citation-ref" aria-label="Reference 39">39and 17 hg O individuals from the 1000 Genome Project^{https://doi.org/10.1038/nature11632}

(2012)." href="https://www.nature.com/articles/s41598-018-20026-8#ref-CR65" data-track="click" data-track-action="reference anchor" data-track-label="link" data-test="citation-ref" aria-label="Reference 65">65(Supplementary Table S5). After filtering the data, the overlap between the data set was extracted and the ‘re-mapping filter’ based on modeling the poorly mapped regions was applied, as described in Karmin,et al.^{https://doi.org/10.1101/gr.186684.114}

(2015)." href="https://www.nature.com/articles/s41598-018-20026-8#ref-CR39" data-track="click" data-track-action="reference anchor" data-track-label="link" data-test="citation-ref" aria-label="Reference 39">39. This resulted in 6.2 Mbp of usable sequence of the non-recombining male-specific Y chromosome region, and sites with minimum 95% call rate were used in the analysis.

A Bayesian phylogenetic approach in BEAST^{https://doi.org/10.1093/molbev/mss075}

(2012)." href="https://www.nature.com/articles/s41598-018-20026-8#ref-CR57" data-track="click" data-track-action="reference anchor" data-track-label="link" data-test="citation-ref" aria-label="Reference 57">57was used to analyse a final data set comprising 7669 SNPs from these 80 individuals. To correct for ascertainment bias, we added constant sites corresponding to the nucleotide composition across the remainder of the chromosome. The four demographic models and other details of the settings used in the MCMC analyses matched those used for analyses of mtDNA. The best-fitting model was exponential growth, which had a log Bayes factor of 8.079 compared with the next-best model (Bayesian skyride). To calibrate the estimate of the timescale, a mutation rate of 8.71 × 10⁻¹⁰mutations/site/year^{https://doi.org/10.1038/ng.3171}

(2015)." href="https://www.nature.com/articles/s41598-018-20026-8#ref-CR66" data-track="click" data-track-action="reference anchor" data-track-label="link" data-test="citation-ref" aria-label="Reference 66">66was specified.

Autosomal analysis

A set of 713,014 SNPs was screened using the HumanOmniExpress-24 BeadChip array in 30 individuals from the Leeward Societies Isles (Supplementary Table S1A). Twenty-six samples yielded high genotyping success rates (<5% missing genotypes), and 686,565 autosomal SNPs, with less than 5% missing data, were kept for further analyses. Inference of cryptic relationships between samples was performed using KING v. 1.4^{https://doi.org/10.1093/bioinformatics/btq559}

(2010)." href="https://www.nature.com/articles/s41598-018-20026-8#ref-CR67" data-track="click" data-track-action="reference anchor" data-track-label="link" data-test="citation-ref" aria-label="Reference 67">67and no first-, second- or third-degree relatives were detected. A single sample clustered together with Europeans in the fineSTRUCTURE^{https://doi.org/10.1371/journal.pgen.1002453}

(2012)." href="https://www.nature.com/articles/s41598-018-20026-8#ref-CR37" data-track="click" data-track-action="reference anchor" data-track-label="link" data-test="citation-ref" aria-label="Reference 37">37run (see below), and was excluded from all population level analyses (Supplementary Table S1B).

The study dataset was produced by merging newly generated Societies data with samples from mainland and ISEA, Melanesia, and Polynesia^{https://doi.org/10.1186/gb-2011-12-2-r19}

(2011)." href="https://www.nature.com/articles/s41598-018-20026-8#ref-CR68" data-track="click" data-track-action="reference anchor" data-track-label="link" data-test="citation-ref">68,https://doi.org/10.1126/science.1153717

(2008)." href="https://www.nature.com/articles/s41598-018-20026-8#ref-CR69" data-track="click" data-track-action="reference anchor" data-track-label="link" data-test="citation-ref">69,https://doi.org/10.3378/027.085.0313

(2013)." href="https://www.nature.com/articles/s41598-018-20026-8#ref-CR70" data-track="click" data-track-action="reference anchor" data-track-label="link" data-test="citation-ref">70,https://doi.org/10.1038/ejhg.2016.60

(2016)." href="https://www.nature.com/articles/s41598-018-20026-8#ref-CR71" data-track="click" data-track-action="reference anchor" data-track-label="link" data-test="citation-ref">71,https://doi.org/10.1073/pnas.1321860111

(2014)." href="https://www.nature.com/articles/s41598-018-20026-8#ref-CR72" data-track="click" data-track-action="reference anchor" data-track-label="link" data-test="citation-ref">72,https://doi.org/10.1126/science.1211177

(2011)." href="https://www.nature.com/articles/s41598-018-20026-8#ref-CR73" data-track="click" data-track-action="reference anchor" data-track-label="link" data-test="citation-ref" aria-label="Reference 73">73, and with 25 random samples from multiple large continental reference populations from the 1000 Genomes Project^{https://doi.org/10.1038/nature11632}

(2012)." href="https://www.nature.com/articles/s41598-018-20026-8#ref-CR65" data-track="click" data-track-action="reference anchor" data-track-label="link" data-test="citation-ref" aria-label="Reference 65">65(Fig. 1a, Supplementary Table 1B). Two independent datasets were produced. Firstly, a dataset comprised of 299,998 SNPs (after excluding SNPs with more than 5% missing data) from 570 samples was used for haplotype-based (fineSTRUCTURE, FS, and GLOBETROTTER^{https://doi.org/10.1126/science.1243518}

(2014)." href="https://www.nature.com/articles/s41598-018-20026-8#ref-CR38" data-track="click" data-track-action="reference anchor" data-track-label="link" data-test="citation-ref" aria-label="Reference 38">38, GT) analyses,f3andF_ISstatistics. FS/GT requires a much higher density of SNP coverage, which was not possible to achieve while keeping samples from Hudjashov,et al.^{https://doi.org/10.1093/molbev/msx196}

(2017)." href="https://www.nature.com/articles/s41598-018-20026-8#ref-CR44" data-track="click" data-track-action="reference anchor" data-track-label="link" data-test="citation-ref" aria-label="Reference 44">44due to the overlap between the different genotyping arrays used. Secondly, a dataset comprised of 92,972 SNPs (after excluding SNPs with more than 5% missing data) from 739 samples including those from Hudjashov,et al.^{https://doi.org/10.1093/molbev/msx196}

(2017)." href="https://www.nature.com/articles/s41598-018-20026-8#ref-CR44" data-track="click" data-track-action="reference anchor" data-track-label="link" data-test="citation-ref" aria-label="Reference 44">44was used for genotype-based analyses only (ADMIXTURE³⁵,f3,F_ISand PCA) (Supplementary Table S1B). For PCA and ADMIXTURE, only unlinked SNPs with R² < 0.2 were kept; 57,825 SNPs passed this criterion.

Although there is a substantial overlap between the two datasets used here (including populations from East and Southeast Asia, Philippines, Indonesia and Melanesia) some important differences need to be mentioned. The dataset used for the FS and GT analyses does not include samples from Taiwan, Tonga, Samoa and Tahiti. These four populations are, therefore, only in the dataset used for allele-frequency based analyses. However, the Kankanaey of northwestern Luzon in the Philippines are proposed as a proxy for early Austronesian speakers from Taiwan^{https://doi.org/10.1038/ejhg.2016.60}

(2017)." href="https://www.nature.com/articles/s41598-018-20026-8#ref-CR44" data-track="click" data-track-action="reference anchor" data-track-label="link" data-test="citation-ref" aria-label="Reference 44">44(Tonga, Samoa and Tahiti) were further controlled for the presence of cryptic relatedness between samples as described above by using the full SNP dataset from the original publication. In addition to the previously reported lack of first-degree relatives^{https://doi.org/10.1093/molbev/msx196}

Maximum likelihood estimates of the ancestry of individuals were obtained with ADMIXTURE v. 1.30³⁵. Following Cox,et al.^{https://doi.org/10.1093/molbev/msw099}

(2016)." href="https://www.nature.com/articles/s41598-018-20026-8#ref-CR74" data-track="click" data-track-action="reference anchor" data-track-label="link" data-test="citation-ref" aria-label="Reference 74">74, fifty randomly seeded runs were performed for each number of ancestral populations (K = 2 toK = 15), and the results for eachKwere summarized with CLUMPP v. 1.1.2^{https://doi.org/10.1093/bioinformatics/btm233}

(2007)." href="https://www.nature.com/articles/s41598-018-20026-8#ref-CR75" data-track="click" data-track-action="reference anchor" data-track-label="link" data-test="citation-ref" aria-label="Reference 75">75. Runs with symmetric similarity coefficient > 0.9 were assigned to the same modal solution, and individual ancestry proportions were averaged across runs belonging to the same mode. The most frequent modal solution is reported.

Autosomal PCA was performed with the smartpca function of EIGENSOFT v. 3.0^{https://doi.org/10.1371/journal.pgen.0020190}

(2006)." href="https://www.nature.com/articles/s41598-018-20026-8#ref-CR76" data-track="click" data-track-action="reference anchor" data-track-label="link" data-test="citation-ref" aria-label="Reference 76">76with no outlier removal step.F_IS(a measure of inbreeding) was calculated in Genepop v. 4.7.0^{https://doi.org/10.1111/j.1471-8286.2007.01931.x}

(2008)." href="https://www.nature.com/articles/s41598-018-20026-8#ref-CR77" data-track="click" data-track-action="reference anchor" data-track-label="link" data-test="citation-ref" aria-label="Reference 77">77.

A series off3tests were performed with ADMIXTOOLS v. 4.1^{https://doi.org/10.1534/genetics.112.145037}

(2012)." href="https://www.nature.com/articles/s41598-018-20026-8#ref-CR36" data-track="click" data-track-action="reference anchor" data-track-label="link" data-test="citation-ref" aria-label="Reference 36">36. Firstly, standardf3statistics were used as a formal test for admixture between all possible combinations of populations in the comparative dataset. Secondly, the outgroupf3test was implemented as a measure of the shared branch length between each of Polynesian groups and all other populations. For outgroupf3, the Yoruba population (YRI) from Africa was employed as the outgroup.

To assess the potential bias introduced by two different SNP subsets and sample clustering procedures used here,f3andF_ISwere calculated as follows: (a) using the dataset ofca93k SNPs and 739 samples (with data from Hudjashov,et al.^{https://doi.org/10.1093/molbev/msx196}

(2017)." href="https://www.nature.com/articles/s41598-018-20026-8#ref-CR44" data-track="click" data-track-action="reference anchor" data-track-label="link" data-test="citation-ref" aria-label="Reference 44">44) and the original population affiliation; (b) using a dataset ofca300k SNPs and 570 samples (without data from Hudjashov,et al.^{https://doi.org/10.1093/molbev/msx196}

(2017)." href="https://www.nature.com/articles/s41598-018-20026-8#ref-CR44" data-track="click" data-track-action="reference anchor" data-track-label="link" data-test="citation-ref" aria-label="Reference 44">44) and the original population affiliation; (c) as per the approach outlined in (b), but using FS-based population grouping (see below and Supplementary Table S1B).

In order to take advantage of the benefits gained from including linkage information when working with high-density genetic data, we employed the fineSTRUCTURE (FS)^{https://doi.org/10.1371/journal.pgen.1002453}

(2014)." href="https://www.nature.com/articles/s41598-018-20026-8#ref-CR38" data-track="click" data-track-action="reference anchor" data-track-label="link" data-test="citation-ref" aria-label="Reference 38">38framework. Genotypes were first phased with SHAPEIT v. 2^{https://doi.org/10.1038/ncomms4934}

(2014)." href="https://www.nature.com/articles/s41598-018-20026-8#ref-CR78" data-track="click" data-track-action="reference anchor" data-track-label="link" data-test="citation-ref" aria-label="Reference 78">78using the HapMap phase II b37 recombination map^{https://doi.org/10.1038/nature06258}

(2007)." href="https://www.nature.com/articles/s41598-018-20026-8#ref-CR79" data-track="click" data-track-action="reference anchor" data-track-label="link" data-test="citation-ref" aria-label="Reference 79">79. Sample were assigned to genetic groups using fineSTRUCTURE v. 2 with default parameters; 7.5 M MCMC iterations were performed in total. The population dendrogram produced by FS was manually inspected and samples were assigned to 21 individual groups.

After excluding a single Leeward Society Isles sample with a very high proportion of European ancestry, we inferred admixture with GT using the remaining combined Societies sample set (n = 25). To gain insight into the admixture variance within the Leeward Society Islands, we performed additional GT runs using the individual clades of the FS dendrogram. For the latter approach, only clades with a minimum of five samples were included, and in one case (‘Societies 3’) the clade was amalgamated with its closest direct neighbor to pass the sample-size threshold. In total, 20 out of 25 samples were used in the individual GT runs. GT analysis was performed following the ‘full’ algorithm protocol^{https://doi.org/10.1126/science.1243518}

(2017)." href="https://www.nature.com/articles/s41598-018-20026-8#ref-CR44" data-track="click" data-track-action="reference anchor" data-track-label="link" data-test="citation-ref" aria-label="Reference 44">44, where each recipient Society genome could copy chunks from the genomes of all other non-Societies donor clusters. One hundred bootstraps were used to assess the statistical significance of the admixture event and uncertainty of the inferred dates. Admixture dates were converted to years using the formula (x + 1) * 28^{https://doi.org/10.1126/science.1243518}

(2014)." href="https://www.nature.com/articles/s41598-018-20026-8#ref-CR38" data-track="click" data-track-action="reference anchor" data-track-label="link" data-test="citation-ref" aria-label="Reference 38">38, wherexis the number of generations since the admixture event and the generation interval is 28 years^{https://doi.org/10.1002/ajpa.20188}

(2005)." href="https://www.nature.com/articles/s41598-018-20026-8#ref-CR80" data-track="click" data-track-action="reference anchor" data-track-label="link" data-test="citation-ref" aria-label="Reference 80">80.

Data Availability

The genotyping SNP and STR data for mitochondrial and Y chromosomal DNA generated during the current study are included in the published article and its Supplementary Information files. The complete mitochondrial genome sequences generated during the current study are available from GenBank ( https://www.ncbi.nlm.nih.gov/genbank/ ) under the accession numbers MG244202–MG244226. The seven novel Y chromosome sequences are available from European Nucleotide Archive ( https://www.ebi.ac.uk/ena ) under the accession number PRJEB22729. The Autosomal data produced from 30 Leeward Society Islanders is available from the corresponding author on reasonable request.

https://www.nature.com/articles/s41598-018-20026-8

Topic Tags

#polynesia #austronesia

12 Replies

Posts: 1462

Dyno-Mite

Dec 03, 2022 1:28 am

(@dyno)

Noble Member

Joined: 4 years ago

their DNA clearly points to us

2 Replies

MeLeona

(@meleona)

Joined: 7 years ago

Member

Posts: 806

May 05, 2023 2:30 am

Reply to

Dyno-Mite

@dyno sadly this information is not taught is school

Angelo

(@angelo)

Joined: 4 years ago

Estimable Member

Posts: 159

May 05, 2023 4:16 pm

Reply to

Dyno-Mite

@dyno who is us, alot of us aren't igorot

Posts: 3212

Prau123

Dec 03, 2022 6:11 am

(@prau123)

Famed Member

Joined: 6 years ago

Some interesting results from Figure 2 (Admixture Analysis) and at K=10,

1) The blue component emerges and peaks among the Kankanaey, an Igorot people of central Luzon. The Kankanaey samples are almost entirely blue, and therefore the Kankanaey are a good sample for this component. I'm assuming the blue component is representative of the Austronesian expansion, and that the Kankanaey are truly representative of the original Austronesians.

2) A cyan component which peaks among the Lebbo, a Bornean group is likely representative of the Pre-Austronesian Southeast Asian component. The Lebbo are therefore the best representative of pre-Austronesian Southeast Asians! What's interesting is that the cyan component predominates in most Island Southeast Asian groups including the Philippine lowland groups (classified as "Other") which means that even the vast majority of Filipinos (since the vast majority of Filipinos belong to the Philippine lowland groups) are mostly Pre-Austronesian Southeast Asian, and not Austronesian!!! It should be mentioned that the vast majority of Filipinos also have a smaller East Asian, and an even smaller Negrito, and Polynesian component; some even have a small Papuan component.

3) The blue component (Austronesian/Kankanaey component) is a small minority in most of Island Southeast Asia which we all know is predominantly Austronesian speaking, and which suggest that the Austronesian expansion was largely linguistic with a far smaller genetic influence. The Javanese have almost none of this blue component. The Malaysians have a very small amount of the blue component.

4) The Samoans and the Tongans who represent the Western Polynesians in this study have a small amount of the blue component (Austronesian/Kankanaey component) about 10-12% of their genome. But what's surprising is that the Samoans and Tongans have an even smaller amount of the cyan component (Pre-Austronesian Southeast Asian/Lebbo component) which suggest that it was Austronesians (in the form of Samoans and Tongans) that would be first to settle into Polynesia, and not the Pre-Austronesian Southeast Asians. It took Austronesians for humans to finally make it further into the Pacific specifically Western Polynesia. Before the Austronesians, there was no one in Western Polynesia (or in Polynesian in general).

5) The Tahitians and the Polynesians of Leeward Society (both are in French Polynesia) have predominantly the green component which is the Polynesian component, and I'm assuming this means that Tahitians and the people of Leeward Society are probably the best examples of this Polynesian component. Tahitians and the people of Leeward Society are more Polynesian than Samoans and Tongans. This green component (Polynesian component) appears early at K=5, but before that at K=4 it was predominantly colored light yellow with a small minority of purple (20% or less). The light yellow component peaks among the Japanese and Han Chinese, and therefore the light yellow component is the East Asian genetic component. The purple component peaks among the New Guinea (Papuan) samples, and is representative of New Guinea Papuans. What this suggests is that the Polynesian component (green component) has a stronger affinity to the East Asian component (light yellow component) than it does to the New Guinea Papuan component (purple component). It doesn't mean that a pure Polynesian (such as a Tahitian or Leeward Society Polynesian) is 80% East Asian and 20% New Guinea Papuan. But it does roughly mean that 80% of their genome is closer to the East Asian component, and 20% of their genome is closer to the New Guinea Papuan.

6) The fact that Tahitians and Leeward Society Polynesians have none of the blue component (Austronesian/Kankanaey) and none of the cyan component (Pre-Austronesian Southeast Asian/Lebbo) might suggest that pure Polynesians migrated themselves to French Polynesia. These Polynesians whereever they originally came from before arriving to French Polynesia had not intermixed with either Austronesians or Pre-Austronesian Southeast Asians (as defined in this study), but obviously adopted the Austronesian language and perhaps culture and technology. It may also suggest that the migration to French Polynesia did not originate in Samoa or Tonga, otherwise, they would carry the Austronesian (blue component) and Pre-Austronesian Southeast Asian (cyan component) genetic signatures which the Samoans and Tongans do carry.

Posts: 3212

Prau123

Dec 03, 2022 8:45 am

(@prau123)

Famed Member

Joined: 6 years ago

Some other interesting things from Figure 2 Admixture Analysis and specifically at K=10

7) The brown component peaks among the Aeta, a Negrito group in Luzon, and therefore this brown component will be referred to as the Aeta component. The Aeta samples used were largely unadmixed meaning they virtually only carry the brown component. Interestingly, the Batak people of Northern Sumatra carry a significant amount of this brown component, approximately 20%. There are Malaysian Negritos in the Malay Peninsula such as the Semang which isn't too far from Northern Sumatra. The Malaysian Negritos were not included in this study, but perhaps they too would carry a significant amount of the brown Aeta component. South Asians also carry the brown component at around 3-5%, and so do the sampled Australian Aborigines at around 5-7%, and the Bougainville Oceanians at around 3-5%. This component must be very old due to its widespread appearance, and that it appears first at K=6 which is just before the Native American component (dark yellow component) which first appears at K=7. The Philippines has time and time again proven itself to be a sanctuary for ancient genomic ancestry, and with pure examples of that ancient genomic ancestry to show for as exemplified by the Aeta with their mostly if not entirely pure brown component. This is due to the archipelagic geography and location of the Philippines.

8) As mentioned earlier, the Native American component appears at K=7 as a dark yellow component. The dark yellow component is found in many East Asian and some Mainland Southeast Asian groups with the East Asians having comparatively more than the Mainland Southeast Asians. The amount ranges as high as 15% among the Japanese and Han Chinese, and perhaps around 3-5% amoung the Vietnamese and Myanmar people. But this component is not found among Island Southeast Asian groups, and not even among the Kankanaey who represent the true Austronesians. The Austronesians supposedly originate in Southeastern China and along the coast of Eastern China in general. This may suggest that the Native American component migrated for the most part within inland Eastern Eurasia or at least away from coastal China, but it did make its way as far south as Myanmar and Vietnam which do have coasts of their own.

9) A majority European (CEU) gray component was to be found among South Asians (STU), and in significant small amounts among the Myanmar samples, an even smaller amount in some Malaysians and Sumatran samples (perhaps via South Asian intermixing?). Native Americans also have it, perhaps due to modern intermixing, or it could be an ancient signal. The European (CEU) gray component is likely an affinity to the Ancestral North Indian (ANI) that makes up usually around 50% of the average South Asian genome. In this study, the European (CEU) gray component is approximately 60% of the sampled South Asian genome. Interestingly also, South Asians are about 12-15% East Asian (light yellow component), and about 15-17% Pre-Austronesian Southeast Asian (cyan component).

Posts: 3212

Prau123

Dec 03, 2022 9:43 am

(@prau123)

Famed Member

Joined: 6 years ago

Deleted

Page 1 / 3 Next

RE: Archaeology by Prau123

Archaeologists verify Florida’s Mound Key as location o...

By Prau123 , 9 hours ago
RE: Archaeology by Prau123

Stone Slabs With Ancient Celtic Symbols Make a Surprise...

By Prau123 , 10 hours ago
RE: Archaeology by Prau123

Archaeologists Missed This MASSIVE Discovery in Nazca –...

By Prau123 , 10 hours ago
RE: Archaeology by Prau123

Never Before Seen 3,500 Year Old City in Peru ...

By Prau123 , 2 days ago
RE: Archaeology by Prau123

Ancient DNA Reveals the Caribbean's First Inhabitants W...

By Prau123 , 2 days ago
RE: Archaeology by Prau123

Lost Pirate Treasure Worth Over $138M Uncovered Off Mad...

By Prau123 , 3 days ago
RE: Archaeology by Prau123

Greatest Historical Discoveries of 2024-2025 ...

By Prau123 , 3 days ago
RE: Archaeology by Prau123

Chinon: i graffiti della Torre di Coudray ...

By Prau123 , 7 days ago
RE: Archaeology by Prau123

A Volcanic Megastructure was Spotted from a Plane- So I...

By Prau123 , 7 days ago
RE: Archaeology by Prau123

Why We Should Build With STONE (Again) ...

By Prau123 , 1 week ago
RE: Archaeology by Prau123

What the NSA discovered in a Mayan Codex will blow your...

By Prau123 , 1 week ago
RE: Archaeology by Prau123

EXPLORING Pre-Ceramic Adobe TEMPLES in Peru I Found on ...

By Prau123 , 1 week ago
RE: Archaeology by Prau123

I Find The Most Isolated, Inaccessible Ancient Ruins In...

By Prau123 , 1 week ago
RE: Archaeology by Prau123

Why Ancient Ruins Are Underground ...

By Prau123 , 2 weeks ago
RE: Archaeology by Prau123

RAW GUERILLA Archaeology in Peru - Full Pillars of the ...

By Prau123 , 2 weeks ago
RE: Archaeology by Prau123

The Nazca Lines Make Absolutely No Sense ...

By Prau123 , 2 weeks ago
RE: Archaeology by Prau123

BREAKING: Danny Hillman reveals artifacts from Java Sea...

By Prau123 , 2 weeks ago
RE: Archaeology by Prau123

Paladins were they historically real?! ...

By Prau123 , 3 weeks ago
RE: Archaeology by Prau123

Secrets of the Mexican Pyramids, Explained! ...

By Prau123 , 3 weeks ago
RE: Archaeology by Prau123

Archaeologists Stunned by Michigan's Hidden Farm Histor...

By Prau123 , 3 weeks ago

Forum Icons: Forum contains no unread posts Forum contains unread posts

Topic Icons: Not Replied Replied Active Hot Sticky Unapproved Solved Private Closed