In Part 1 I went through the nitty gritty of getting the publicly available DNA data from research publications and turning it into an easy to use commerical text files in 23andMe format (RSID+chromosome+position+genotype/allele) that can be uploaded to various sites that offer a slew of DNA analyses. Now, let’s get into the fun stuff.
I’ve gone through a sampling from a few different datasets, including
The first study was looking at the migration from Northern Mongolia to North America. …
If you’ve read any of my previous posts, you know I’m an avid user of MyTrueAncestry — a DNA data analysis service that accepts uploads of raw data from popular testing companies like 23andMe or MyHeritage. What’s different about them is that instead of (just) running a PCA on your DNA data against “known” populations, they pull in actual historical DNA samples retrieved from various archeological sites. The more you pay, the more samples you have access to, and the more complete your ancestral portrait.
I discovered a fun new DNA data visualization tool that is not only an awesome alternative to the completely outdated GEDmatch reports and visualizations, but packs some cool extra features that are not obvious at first glance, including running your own admixture calculations or user-contributed models on various samples, and plotting custom PCAs.
Genoplot is visually pleasing, but is not very intuitive… Also, there’s a trick you need to know to actually load your own samples into the additional tools beyond the standard admixture. So here’s a step-by-step guide to get you started.
In the top right corner we have…
Since I’ve become involved in multiple Facebook groups dealing with DNA, ancestry, and genealogy, I’ve realized that most people have a very vague idea of what is underneath all of this. And many seem to have false preconceived notions… Let’s walk through the basic steps of how DNA testing and subsequent data analyses actually work. I’m not claiming a deep expertise here, just reasonable basic understanding and working knowledge.
Let’s start at the ground level. This is DNA:
It lives in the nucleus of each cell in your body and is responsible for all reproduction / regeneration of a living…
(TLDR: fun stuff is at the end)
Well, I think I have officially lost my mind 🤪 I spent a good chunk of my past week figuring out where to find published research DNA samples and how to convert them from the crazy data formats geneticists use to a simple CSV like you get from 23andMe or MyHeritage.
And I finally succeeded!!!
While it’s still all fresh in my mind, I’m going to document this feat here. Maybe I’ll do more. Maybe someone else will try to replicate what I’ve done. Who knows. For the benefit of humanity 😄
When I get into something, I tend to REALLY get into it. I don’t stop till I’ve learned as much as I can, or something new more exciting comes along :) For the past year or so, I’ve been neck deep into human DNA ancestry. It’s a fascinating and quickly developing topic, being fueled by advances in genetics, statistics, and computing power. We can extract and analyze the entire genome of any organism, and run statistical analyses on large samples. It’s a very rich field, and as such, it pretty dam confusing at times.
I have always enjoyed history, especially of the ancient world. Something about things lost in time that you can no longer directly experience fascinates me. A few years back, I took the 23andMe DNA test to see if it would shed some light on my family origins, as much was lost in the 20th century during the multiple Russian upheavals… The results were interesting, though still quite porous.
Then 23andMe started updating their algorithms, focusing on “depth-first” approach and prioritizing more recent history, while ignoring the bigger picture. Deeply disappointed in the new version of their ancestry composition results, I…
It’s always hard for me to remember the scales of things… so here are some facts I keep learning and re-learning put in perspective:
I’ve been wanting to write about this topic for many months now, and finally got around to it… I thought a lot about the right format, and in the end decided on a good old slide deck, but with a voiceover added.
The deck is here (as a PDF) in case you want to read through it yourself.
And the video with my (slightly rushed) voiceover is below:
In creating this and learning a whole lot about my own genetic past, I used the following sources:
Here are the best I’ve found:
I’m a Caltech particle physics PhD turned Data Scientist, currently working as an independent consultant.