The fundamental processes of population differentiation and speciation are, at their genesis, the same when viewed from a genealogical perspective. Initially, when two interbreeding populations begin to diverge from a single interbreeding population the gene copies within both descendant populations for any particular locus will share many ancestors in common. In absence of genetic exchange among these populations, over time genetic drift leads to sorting of the gene-lineages. As some gene-lineages proliferate and others go extinct, patterns of exclusive ancestry within each population evolve. Eventually, gene copies at all loci within each interbreeding population will evolve to a state of reciprocal monophyly if the process of genetic drift is unopposed. From coalescent theory we know the time-frame of this transition will vary for neutral genes by the stochastic nature of the evolutionary sampling process and as a function of the inbreeding effective size of the locus considered. Introgressive hybridization can also create mismatches when gene flow between divergent populations brings distinct genes across the boundary of differentiation. It is at these later time points that delineation of species becomes problematic for definitional as well as analytical reasons.
Our recent research has focused on developing a novel statistic, the genealogical sorting index, for the problem of species delineation and population differentiation. Conceptually the approach is more consistent with coalescent theory compared to other methods applied to population differentiation. It is also more powerful as it uses information inherent in genealogies. In addition the statistical methodology adeptly accommodates uncertainty in genealogies by integrating (marginalizing) over genealogies following other methods applied in parameter estimation in population genetics using maximum likelihood and Bayesian methods. We have developed software for calculating our novel statistic, the genealogical sorting index, and assessing its associated probability value for hypothesis testing.