What is GEDmatch? How Does it Help Genetic Genealogists?

Video Transcription

Gedmatch doesn’t test your dna, but you can still find cousins and a whole lot more. So what is actually in the GEDmatch database?

Howdy. I’m Andy Lee with Family Fanatics, and this is a segment of DNA be sure to subscribe to our channel and click on the bell if you wanna be notified about upcoming episodes. Gedmatch is a database that you can submit to your raw DNA files to and be able to match with the 1.2 million other people who have submitted their files already. GEDMatch is not a testing company. You can’t buy a DNA kit from them. All you can do is upload the data from the other testing companies. Now, why would you wanna do that? First off, GEDMatch has some different tools and a different way to analyze and look at your DNA and be able to match with other people. The second reason is that for some of the companies, the only way to match with people in their database is to be in their database.

Namely, 23andme and Ancestry don’t allow transfers of outside DNA kits into their databases, considering that these two companies have 75% of all the DNA tests that have been done. If you’re not in one of those databases, you could be missing out on dozens or even hundreds of potential matches. GEDMatch was a solution to this problem by allowing uploads from all of the companies. However, it is still up to each individual to choose whether or not they wanna upload their information to GEDmatch. And based on the size of GEDmatch , at 1.2 million, less than 10% of all the DNA test kits that have been made have been uploaded to GEDmatch. I wanted to get an idea of which customers of which company used GEDmatch the most, and in order to do this, I had to extract some information from the GEDmatch database. Now, GEDmatch labels a kit according to the company you test at an A for Ancestry and M for 23 and me and H for my heritage and a T for Family Tree dna.

To figure out how many total tests were from each company, I needed a large set of valid kit numbers and then I could count up the number of A’s M’s, T’s and h’s. I managed several kits GEDmatch, and so for each one I did a one too many match, and this gave me 2000 kit numbers for each one of them. I stuck all these numbers in a spreadsheet, removed all the duplicates, and ended up with almost 21,000 unique kit numbers. Then it was a simple matter of adding up all the A’s, Ts, M’s, and h’s. Now this chart shows what those results were. You can see that Ancestry has the most number of kits in the database database followed by 23andme and then Family Tree dna. And lastly, my heritage. There’s also the other category which includes kits from Living DNA from waging, as well as some other companies.

And it includes kits that have been manufactured, like the ones that law enforcement may use, or my combined kit where I actually took the information from all of my tests and combined them into one kit. The other category is a really small percentage of all the kits on GEDMatch, so I’m not going to discuss it much here. Now the order of the top four companies is mostly unsurprising Ancestry is the largest with about 10 million, and then 23 and Me at about 5 million. Now, my heritage recently overtook Family Tree DNA with about 2 million samples compared to Family Tree DNA’s 1.5 million samples. So that was the only place where the amount of kits on GEDMatch doesn’t match the actual size of the different company’s databases. But when you compare the percentage of samples in the GEDmatch database to the percentage of overall DNA samples that each company has, you start to see some interesting numbers.

Both Ancestry and Family Tree DNA have a larger representation in the GEDMatch database. Then you would expect just from their overall database size 23 and ME is less represented. And at the very bottom is my heritage, which is far below what we’d expect based on their database size. Now, why this disparity? I can think of a few reasons, but it doesn’t cover everything. First, family Tree DNA has always been the genetic genealogy company that caters to the serious genetic genealogists. So it stands to reason that those people would be the first to adopt GEDmatch and be the most willing to put their information on GEDmatch. But that can’t be the reason why Ancestry is overrepresented in the GEDMatch database because a lot of their marketing is targeted towards the casual user of genealogy or even the ones that are just getting a genetic genealogy test for the ethnicity results.

So I really can’t think of a reason that explains Ancestry’s Overrepresentation in the GEDMatch database. Now, 23andme has always had genealogy as a secondary function of their DNA gathering, so I would expect them to be underrepresented. My heritage is a conundrum because I would expect that their clientele would at least be as interested in genealogy and genetic genealogy as Ancestry’s clientele is. Yet they far underperform when it comes to submitting to GEDmatch. Since my heritage is the youngest of the genetic genealogy testing companies, it might be that a majority of their clientele have already tested at one of the other companies, and so there’s no need to submit another test to GEDmatch. Or another explanation is that they might have just had so much recent growth that a lot of their customers haven’t even had the opportunities to submit their information to GEDmatch. Maybe in another year.

When I redo this analysis, I can see whether or not anything has changed that would help me to understand why my heritage seems to be so underrepresented in the GEDmatch database. So with these numbers, I also wanted to see what percentage of each company’s clients have submitted their information into GEDmatch. So first I figured out how many samples in the GEDMatch database come from each company, and then I divided that number by the total size of each company’s database. And you can see that over 11% of Family Tree DNA’s customers have uploaded their information to GEDmatch while only 1.4% of my heritage customers have uploaded a GEDmatch. Ideally, I’d like all these numbers to be really close to a hundred percent, but even if all of the companies were only at 11% like Family Tree dna, then the GEDmatch database would double in size and it would be that much more useful. So if you haven’t uploaded to the GEDmatch database, I’d urge you to go and check it out. It’s free to upload and use their basic tool set. And I have videos with links in the description below on how to download your raw DNA file and upload those to GEDmatch. As always, if you have any questions about the GEDMatch database, put it in the comments below and I’ll try to answer it for you. And if you like this video, be sure to give it a thumbs up and pass it on to your friends.