DNA analysis with Genoplot

Tim Piatenko
5 min readSep 24, 2021

I previously wrote about the nifty online set of DNA analysis tools called Genoplot — you can see my intro post here. Since then, I’ve been using it more and more, getting deeper into the functionality, as my understanding of historical DNA data improves over time. I’ve found Genoplot to be the most advanced application that is reasonably easy to learn and use. Let me try to highlight some of its more advanced features I mentioned in the previous article in a more of a “journey” flow.

To recap, Genoplot consists of 3 interconnected tools:

  1. Admixture calculators like the ones that can be run over at GEDmatch, such as Harappa K16 or Eurogenes K13.
  2. Nmonte runner that allows you to run a custom admixture analysis by selecting your own groups or individual samples and tweaking parameters, or creating new “prepackaged” calculators where you can group samples into your own categories.
  3. PCA tool that has the same inputs as Nmonte runner.

The journey starts with uploading your data and running Harappa K16 calculator on it. This creates a new personal sample for your account that can be used in any other tool. However, to take full advantage of the suite, you need to simulate G25 coordinates, as described in the earlier note. What this does is allows you to use the same tools you can find over at Vahaduo in a much nicer and aesthetically pleasing interface with awesome modern visualizations. And it leverages the amazing work from Eurogenes blog.

Once you’ve got your G25 coordinates, your samples will be available in the drop-downs:

And the fun begins! Now you have access to all the G25 samples and populations, which can be analyzed in a myriad ways — by geography, by age, by common origin, etc etc. For example, here are the “Mid-to-late Bronze Age” samples you can use. Most will give you the approximate date in parentheses.

You can also click the “Calculators” toggle to use the ones already built by other users, or click the “+” button and create your own. Here’s my “Bronze Age” calculator:

To create a new one, you define one “Component” at a time and give it up to 20 samples or groups.

When you run one, you get a bunch of fun visuals to absorb:

  1. Ordered list of contributing populations by %
  2. Pie chart with some details (+ oracles if you click the middle)
  3. Map view of the populations (a little buggy…)
  4. Table view, which will include all the samples you select as inputs

The number in the center of the pie chart is the “goodness of fit” measure. Below 1 is what you should be aiming for, especially for newer populations.

What I found to be most useful is to do one of two things:

  1. Run the DNA samples against same-age groups, like “Late Bronze Age” or “Middle Ages”
  2. Run against groups you suspect you have in your ancestry, rule out the ones with very small contributions, and iteratively repeat the process until you get the best 10-ish population fit you can get.

The former will give you a good idea of how your ancient relatives migrated over thousands of years. Some may have stayed in the same general area, while others could have traveled thousands of miles from their original areas.

Early Bronze Age
Mid-late Bronze Age
Iron Age
Middle Ages

The latter will help you zoom in on your ancestry and pick the populations that best represent what you are today. I use it with modern G25 groups.

Remeber that none of this is 100% accurate! Generally, the lower the number in the middle, the better the fit. But it’s still a model based on assumptions. You may get a really good fit with samples across the entire existence of the human race… but does it make sense, given that the later ones are a combination of the earlier? Whatever conclusions you reach, do a reality check before you get too excited :)

Finally, you can run your own Principal Component Analysis to see the degree of relationship among the various groups (or samples) and yourself. With PCA, remember that inputs really matter! If you select 10 close populations and one really distant one, the 10 will be all bunched up together… You need to remove the “outlier” to see the differences.

Here’s an example I ran against myself and a few of historical Viking DNA samples I downloaded, compared against existing VK2020 groups available in the tool + some related modern populations:

And here’s my favorite “overview of Europe and Near East” PCA based on me and my wife, as well as a number of Baltic-Viking and one ancient Levantine sample. The bigger the circle, the closer I am to them:

Now go upload your own data samples and HAVE FUN!

--

--

Tim Piatenko

I’m a Caltech particle physics PhD turned Data Scientist. Russia → Japan → US. Also on Mastodon @timoha@mastodon.world / @timoha@newsie.social 🐘