How Phasing Improves GEDmatch DNA Match Results

Video Transcription

(00:00):
You have lots of matches, and I’ve told you before that a lot of those matches aren’t even real matches. But how can we find out which matches are the real matches? Well, if you happen to have a parent that is tested as well, you can phase your kit to help improve your match results.

(00:22):
Howdy, I’m Andy Lee with Family History Fanatics where we help you understand your DNA, climb your family tree and write the story of your ancestors along the way. Now, many people will ask, what is the purpose of phased kits? And the simple answer is, is that it can improve your match results because you have two of each chromosome, one from your mother and one from your father. The companies, when they’re testing and reporting that information, they don’t know what letter belongs on which chromosome. Now if you have, let’s say, at a certain location, two a’s then yeah, A belongs on each chromosome. But if you have an A and a C, well, one chromosome has an A and one has a C, but they don’t know which chromosome it is. And so some of those matches are false matches because they’re not looking at a single chromosome. They’re actually jumping between chromosomes as they’re going through the match.

(01:24):
Phasing compares your DNA with one or both of your parents in order to separate out into a maternal chromosome kit and a paternal chromosome kit. So now when you are doing a match, you’re not comparing some of your dads and some of your mom’s DNA. You are only comparing the DNA from your dad or the DNA from your mom, and that improves your match results. But by how much does it really improve your match results? So I did a little experiment and let me go through what all that involved. First off, I only wanted to use matches that had at least 10 centimorgans or more. Why stop at 10 centimorgans? Well, as you go below 10 centimorgans, lots of reaches has shown that the number of false matches increases quite a bit. I created a phased kit using both of my parents’ kits. So I compared my kit to both of my parents and that helped divide out which of my kit is the maternal side and which is the paternal side. Next, I gathered the match list of just my regular kit. I gathered the match list of my paternal face kit and my maternal face kit, and I gathered a shared match list between my kit and my mom and between my kit and my dad.

(02:57):
In a perfectly organized family where everybody always had the same number of children and there was the exact same number of people that tested on both sides of the family, you would expect to have 50% of your matches from your mom and 50% of your matches from your dad. Let me start off by telling you no family is that perfect. It just doesn’t happen and mine is no exception. So let’s look at some data of these phased kits. Now again, this is comparing my match list with my paternal phased kit and my maternal phased kit match list. If I look at my maternal face kit match list, it only has 24.3% of the matches that are on my regular match list. That’s a lot less than 50%. So on my mother’s side of the family, it basically looks like there’s either not as many children and grandchildren and great-grandchildren, or they just don’t test their DNA near as much.

(04:07):
On my paternal side, I had 52.6% of matches from my regular kit. Now, this is right around the level that you would expect for a perfect family, but it’s twice as much as what my maternal kit had. So this side of the family either has twice as many kids or they have twice as many people that want to have their DNA tested. The whole point of this video though is the people that don’t match either one. Remember the whole point of phasing your kit is to improve your matches, to get rid of matches that aren’t really matches. And there was 24.4% of my matches that didn’t match either my maternal face kit or my paternal face kit. So most likely these matches were actually using segments from both of ’em to say that it was a match. So a quarter of my matches didn’t match.

(05:02):
Now these were all people who shared 10 centimorgans or more with me. So we’re not talking about people that share small segments. If we look at these three percentages, 24.3, 52.6, and 24.4, it adds up to 101.3%. It should add up to 100% In a perfect world. Why doesn’t it? That’s because there’s some of those matches that actually match both my maternal kit and match my paternal kit. Now, I do not have a lot of multiple lines of a relationship in my family tree. There’s a couple that I’ve found, but they’re not necessarily between my maternal and my paternal side. They’re actually either on my maternal side or they’re on my paternal side through multiple lines on those sides. So having a little bit less than 1% of matches that match both sides was interesting to me. If your family has endogamy, this number’s probably going to be really high.

(06:08):
So that was the paternal kit and the maternal kit. Each of the companies has a shared in common with tool, and I also wanted to see, well, how good would a shared match list be compared to a phased kit? So I took a look at a shared match list for myself and my mom and a shared match list for myself and my dad, and compared those match lists to just the match list for myself on my maternal side using the shared match list, there was 26.5% of those matches were also on my regular match list. So a little bit more than with the phased kit, but right in the same range.

(06:54):
My paternal side had 54%, again, a little bit higher, but still within the same range. Now the neither one, in other words, the matches that showed up on my match list, but were not present on either my maternal or my paternal match list was 21%. And again, based on the higher percentages of the other two, this is right around where it’s expected. And again, it’s right within the range of where we’d expect with the phase kit. If we go and add all of those up, we get 102.3%. So I would actually expect more matches that match both of us. And in fact, yeah, using the shared match lists, it ends up that there’s almost 2% of those matches are found in both my mom’s side and on my dad’s side. So this was some really interesting data and I thought, okay, now what could I do to compare this more?

(07:52):
And I said, what if I actually combine these two lists and see what the discrepancies are between that? So I did a combined analysis of the phase kit matches and these shared match lists on my maternal side, there is 23.8% of matches that show up both on the phased kit and on the shared match list. On the paternal side, it’s 51% that show up on the phase kit and the shared match list. Now the neithers actually drop even lower down to just 19.7%. What this means is, is that these are the matches that don’t show up on any of those four lists. The maternal phase kit, the paternal phase kit, the mom and me shared match list and the dad and me shared match list. So I expect that number to be lower and it actually has dropped some, adding all those up. Whoa. Now I get 94.7%.

(08:55):
Why is that so low? Remember, before when we had over a hundred percent, we really had to subtract out those matches that matched both, and in this case we’re under a hundred percent. So subtracting out the both matches is actually gonna make it even less. People that matched both were 0.7% of all matches. So what is going on? We have to look at one other thing, and that is the people who matched the phase kit but did not match on the shared match list. So when we’re looking at my maternal phased kit, they were on that list, but they weren’t on the shared match list with my mom. That ended up being about 2.8% of matches. That’s actually a pretty significant amount. On the paternal side, it was 3%. These are the people that matched my paternal phased kit but did not match my dad and i’s shared match list. On the flip side, we also need to look at those who don’t match the phased kit, but they do show up on a shared match list.

(10:20):
In this case, the maternal side was 0.6% of matches and the paternal side was 1.4% of matches. Now, that is just a lot of percentages that I’ve thrown out in different ways, and probably the question that a lot of you’re asking is, so what can I really gather from this? Well, first off, I would say that having a face kit is slightly better than a shared match list, but honestly not that much better. So if you have access to a shared match list but you don’t have access to a face kit, you’re gonna still be okay. On the other hand, if you can create a phased kit, that’s going to be just a little bit better. The next thing is about a quarter of my matches above 10 centimorgans were false. Now, the majority of your matches are going to be false matches, but that’s because they are stacked on the less than 10 centimorgans.

(11:19):
I was just worried about the more than 10 centimorgans. So when you’re thinking about that as far as your research, as you get down to the matches that are in the 10 centimorgan range, recognize that one in four of ’em might not be a real match. I decided to break this down even further, and I created this table. So this is each one of the different ranges from 10 to 15, 15 to 2020 to 25 and 25 to 30, and what the percentage of the false matches fell into each one of those. So of all the false matches, which were a quarter of everything, 80% of ’em fell in the 10 to 15. That’s what you’d expect. That’s where the smallest one is. That’s where we expect most of those false matches to be. 15% were in the 15 to 20, and then 3% and 1.2%.

(12:14):
So very few of these false matches were in the highest. This matches up exactly with what I would expect. However, I then expanded this table out and I wanted to see, okay, well what about my percentage of matches that are false matches? So if I look at all of the matches I have between 10 and 15 centimorgans, 22.9% of those are false matches, okay, that’s expected. If I know 25% overall and 22%, that makes perfect sense. If I look at the next category, 15, 2 20, about 15% of those are false matches. So it’s decreasing, which is what I expect. But what really surprised me was the 20 to 25 and the 25 to 30. If I’m looking at all of my matches in the 22 30 centimorgan range, it ends up that 9% or more of those matches were false matches. I actually expected a much lower percentage in that range.

(13:22):
Now, when I say false match, I mean they don’t match my mother or my father, whether it’s a phased kit or a non, or whether it’s a shared match list kit. They don’t match my mother and my father, so I couldn’t have been a match with them. They’re a match just because the DNA was jumping across between one chromosome to the other to create this match. From a research standpoint, that’s something that you just need to be aware of as you’re looking at each of these matches. Hey, even when you get into some of the higher values, this 20 to 30 centimorgans one in 10 is probably a false match.

(14:09):
So if you are focused on one match and you just can’t figure out how they’re related to you, it’s a good chance that they might be a false match at that point. Now, there’s still plenty of matches that you can use to help find ancestors, so this isn’t gonna derail the overall research with DNA, but it’s something to be aware of so that you don’t get caught going down a rabbit hole that you’re really not gonna find how that person’s related to you. Now, if you’d like to learn more about small segments and how they show that they’re not really matches, I’ve got a video up here for you. If you’d like to learn more about how to do a phased kit, you can watch this video right down here and if you wanna subscribe to the channel, then you’ll be able to be notified about any upcoming episodes. Put your comments below and I’ll try to answer them.