My DNA discovery journey

Tim Piatenko
7 min readMay 4, 2021

I have always enjoyed history, especially of the ancient world. Something about things lost in time that you can no longer directly experience fascinates me. A few years back, I took the 23andMe DNA test to see if it would shed some light on my family origins, as much was lost in the 20th century during the multiple Russian upheavals… The results were interesting, though still quite porous.

My 23andMe results V1

Then 23andMe started updating their algorithms, focusing on “depth-first” approach and prioritizing more recent history, while ignoring the bigger picture. Deeply disappointed in the new version of their ancestry composition results, I embarked on a journey through dozens of services, including FTDA, MyHeritage (where I ended up taking a second DNA test for validation), DNA.Land, Sequencing, etc.

My 23andMe results V2

I ran my raw data from both services everywhere I possibly could and collected a wide array of results. Most were echoing the original 23andMe composition, but there were some serious fluctuations…

It was clear I had Slavic and Northwest European roots, some Ashkenazi blood (which, as I learned later, is almost identical to Sardinian and other South Italian markers — but that’s a whole other story in itself…)

What was more fascinating were the occasional Finnish and Siberian traces, as well as an unknown chunk of something “Balkan”. Some seemed to suggest Hungarian roots too.

Even the same provider could not seem to fully reconcile my two separate test results.

I was realizing that the various services have their specific customer bases, which not only determine the reference populations they have access to, but also the agenda they choose to follow to satisfy their audience.

So I figured I’ll approach this as a statistical triangulation problem and amass as many results from various sources as I possibly can. I’m a data scientist and understand how hard statistical modeling is. Each model comes with its own assumptions and limitations, so the more options you have, the better. I even combined my raw data from 23andMe with MyHeritage by hand (since they mostly test different markers) to get a genetic “superkit” to use.

FTDNA results for 23andMe raw data

V3 of FTDNA ancestry results (based on my 23andMe raw data) seemed very reasonable, given what is known of my recent family history. And it was in line with the latest from MyHeritage.

MyHeritage v3 ethinicity estimate results

There was the question of German heritage, which is what my great-grandfather was, according to family tradition… Yet it almost never showed up. I even went to GEDMatch and ran my data through all the possible models they offer for the more scientifically inclined. I read all the guides and many reference materials, and that’s when I started being aware of the emerging trend in archeology — sequencing DNA recovered from burials.

GEDMatch JTEST admixture results for 23andME+MyHeritage raw data

By this point, I was fairy confident in my general understanding of my ancestral makeup, especially in light of my Haplogroup analysis, whcih showed I was Baltic-Finnic N1c1a1a1 L550 on my dad’s side and (likely) Danish T2a1b1a1 on my mom’s. There was clearly Siberian-Baltic-Scandinavian trace there. But I was lacking a “story” to tie it all together.

Haplogroups derived from 23andMe data via Sequencing.com

And this is when I discovered MyTrueAncestry. At first, I thought it was cute — Something of a “fun factoid” machine. But the deeper I went, the more serious it became.

The difference here is that in addition to comparing your own DNA markers against multiple reference populations that are considered representative of modern ethnicities, this service goes out and obtains archeological DNA that is technically in the public / research domain. Your data is compared to individual historical samples 1:1, just like other services do when providing you with potential relative matches.

MyTrueAncestry map of ancient DNA sample matches, based on my raw DNA. Yellow dots signify direct SNP segment overlap, suggesting distant family relationships.

This blew my mind! I did not realize that the world of archeology as a whole has been turned upside down by the newly productionalized DNA sequencing techologies. Paper after paper is being published as we speak, looking into long accepted as well as wildly disputed historical “facts”. We are finally at the point where we no longer have to rely solely on ancient manuscripts and individual items. We can trace the genetic code thousands of years into the past directly!

Example of a Y-DNA haplogroup graph from MyTrueAncestry

The site provides multiple levels of membership that open more and more doors. You can look at matches, compare to ancient and modern populations, build maps and timelines, explore haplogroup clusters, and zero in on matching chromosome segments. And the best thing — the resulting storyline fit in with the other analyses!

Sample from my “family story” timeline from MyTrueAncestry

What makes the service really interesting is the ability to look up further information about a potential match, including the actual scientific papers behind the data.

An example of DNA segment matches for a particular archeological sample from MyTrueAncestry

But while I found the built-in tools quite good, there were still a few things I wasn’t completely happy with. So I put my analyst hat on and embarked on a small project to extract the text from all the results, compile it in one dataset, and create my own visualizations.

An example of a blurb and additional resources for a match from MyTrueAncestry

Basically, I copied ALL the text from the various tabs of analysis results and pasted them into a few text documents. I then created a CSV file with the following structure:

Culture,Age,Location,Region,Lat,Long,Relation,Link

It took some time to find the various places in Maps and grab their coordinates, and copy-paste the dates and genetic distance numbers, but in the end I had a neat little file with everything I needed. I fired up R and wen to work.

First, I created my own version of the provided maps

Map of SNP segment matches from MyTrueAncestry
My custom ggplot2 map based on MyTrueAncestry results
My custom ggplot2 map based on MyTrueAncestry results split by BC vs AD

Then I remade the timeline, which was too hard to read and had to be split into multiple separate images. I did not need the pictures, but I prefer color-coding the labels.

Part of the overall Ancestral Timeline from MyTrueHeritage
My custom vertical timeline made in ggplot2 based on MyTrueAncestry results

And then the radar chart just for fun. I can’t say I’m 100% happy with it now, but it does what I need it to do.

So now I’m a happy puppy! I have a pretty cool family history dating back to over 6000 years ago. I’m still missing the last few hundred years… and I can’t be 100% sure of any of it, of course, but I have a consistent narrative for myself:

  • We started at two opposite ends of Eurasia — in Mesolithic Western Europe and Southern Siberia
  • Swedish Gotland and Baltic emerged in the timeline in Neolithic
  • Meanwhile, the Eurasian Steppe people gradually moved West
  • 1st Millennium BC was Baltic-Scythian
  • Slavs (and some Avars) came onto the scene in mid-1st Millennium AD
  • Then the Viking age exploded all over Europe…
  • Norwegians went into the Atlantic to Iceland
  • Danes went to England or stayed close to home
  • Swedish Rus lead by Rurik went down into Eastern Europe
  • After all was said and done, most of my family was East Slavic, but parts lingered in Sweden, Denmark, Germany, and the Baltic states.
MapMyGenes modern edition from the Sequencing.com app store
My custom ggplot2 map of DNA matches from MyHeritage

--

--

Tim Piatenko

I’m a Caltech particle physics PhD turned Data Scientist. Russia → Japan → US. Also on Mastodon @timoha@mastodon.world / @timoha@newsie.social 🐘