Visual Phasing with a Twist - Segment Phasing (Part 1)

Video Transcription

(00:00):
Visual phasing is a way you can take the DNA from three siblings and determine which parts they got from each grandparent. Today I’m gonna show you something that I call segment phasing, which uses the principles of visual phasing to reach the same goal.

(00:17):
Howdy, I’m Andy Lee with family history fanatics where we help you understand your DNA, climb your family tree and write your ancestor story along the way. Today I’m going to be talking about segment phasing. Now, segment phasing is very similar to visual phasing and a lot of the steps are the same, but instead of using the graphics to try to determine where the recombination points are, we’re gonna actually use these segment data to determine where those recombination points are. Let’s begin by going over this spreadsheet that I’ve put together. This is in Google Docs and there is a link in the comment section down below so that you can download and duplicate this as many times as you want. Now on the setup page, there are three really different parts. There is the segment designation, which includes the colors for each type. Now initially there’s going to be the initial where you don’t know which side it’s on.

(01:10):
Then you determine which one’s a parental on your paternal and your maternal side, and finally you determine which of your specific grandparents each of those segments are a part of. Next is the siblings, and you can have up to five siblings. I’ve actually got sheets here for three, four, or five siblings and you just need to put in your GEDmatch number as a reminder to yourself. And finally, there is some chromosome information table. Now this is a table I put together based off of data that I’ve gathered to help you with some of the selections, and you’ll see that as I go through this process. Now for those of you who have used Steven Fox’s Excel sheet for visual phasing, this looks very similar and actually when I saw his, that’s what I sort of based it around. However, this does not automate all of the pulling information from GEDmatch.

(02:03):
You just have to do that to yourself. But there are some things in this spreadsheet that are automated so that it will help you in determining where to put recombination points. Now the first step is we need to copy our data to the spreadsheet and what we’re gonna be copying is we’re gonna be copying the segment information. So for each of the siblings compared to each of the other siblings, we need to copy both the half match regions and the fully matched regions back on the spreadsheet. I’ve filled in for three siblings and I’ve blacked them out for privacy. But what I’m gonna do now is go to GEDmatch and I’m going to do the one to one comparison for each combination of these three siblings. So on the one to one comparison, I only need the position and this is just gonna give me the table and that’s all the information that I need.

(02:57):
The next thing that I want to do is I want to change the minimum segment size. Now, normally I would advise against changing this, anything less than seven. However, this is a special circumstance and you’ll see in a minute why we want to change this down and you wanna change it down as low as you can right now GEDmatch allows you to change it down to 3 centimorgans. Previously they allowed you to change it all the way down to 1 centimorgan’s, and I found that even changing it all the way down to 1 centimorgans, there was still some more useful information. I am going to click on the prevent the hard breaks, and then I’m going to compare these two kits. Now, GEDmatch puts out this table that has all of this. Now this was just the half matched region, so that’s important to remember.

(03:44):
I’m going to just highlight all of this, everything in the table, I’m going to copy it and then I’m gonna go back over to my spreadsheet. And in the segment information, I’m going to paste this in starting in column B, and I paste all that information in there and I really don’t need this first one that is the title row. Now I can type in what the comparison is. This was person A two person B, so I’m going to copy that all the way down. And then what is the identity? This is you, whether it is half or full. So I’m going to put in half and I’m gonna copy all that down. Now you’ll notice that it just changed this to yellow, and that’s part of the coding that I put in here, and that will be important later on as we start mapping out each of the segments.

(04:35):
Going back to GEDmatch, I can now do the fully matched region. All I need to do for that is click on the full match box and then I can compare. Now I have the list of fully matched segments and I’m going to copy those and I’m going to paste them into my spreadsheet. So I’ll just go down to the bottom of this in column B right there, paste them all in and it’s highlighted all these green, it doesn’t really matter, but these are all a B, so I’m going to make that all A B. And these are all the fully identical regions. So I’m going to copy that down. And now you see that this is just change that all green. I’m going to do this now for A to C. And then I’m gonna do this for B to C as well. Here I’ve added all that information for the different combinations, A to B, B to C, and A to C, and I’ve gotten both the half and the full identical regions.

(05:35):
Then I’ve sorted this by the chromosome and by the stark position. So now I have all of my data copied into the spreadsheet. The next step then is now I need to go through and identify the recombination points on each one of the chromosomes. In order to do this, I’m going to duplicate this sheet. Now I just have three siblings, so I’m using the three chromosome sheet and I’m going to duplicate. So now I have a copy of this. Now I’m gonna change the name of this and I’m gonna name it the name of one of the chromosomes and I’m going to work on chromosome number 20. Normally I start from 22 and go down to one. I’m just gonna give you three chromosome examples in this. So I’m gonna start with chromosome number 20. Now I’ve relabeled this as chromosome number 20, and then I need to put in the chromosome number up here and it is automatically now going to pull all of the chromosome 20 information from the segment sheet and put it on here.

(06:33):
Now there are three really four sections of information to this sheet. First off is the segment information, and this is being pulled directly from that segment sheet. That’s why you need to download all that segment information first. Next is the recombination point section, and this is then going and looking at this segment information and trying to identify the recombination points based on the common characteristics. And the third is the mapping section. And this mapping section over here is where we will do basically the visual phasing that you might have learned in other videos. So let’s go over the recombination point. Now the recombination point is looking at where the stop and starts of our half and our fully matched regions are. Now this one on chromosome number 20 of this example happens to really be a perfect example because everything lines up exactly right. What do I mean by that?

(07:32):
Well, at a recombination point, recombination is going to happen with just one person. It doesn’t happen with two people at the same time. It might happen two people close together, but it happens usually with just one person. And so that person should show up in two different things. So if I take a look at this first one, 4.4 and I go down, I can see that hey, at BC there’s a 4.4 start of a fully matched, and at a B, there’s a 4.4 start of a fully matched. That means that B is the owner of that recombination point. So I now have a rough idea from this of where the recombination points are. They are at 4.4, 19.1 30, 2.9, 50.2, and then the chromosome ends at 63 and it starts at 0.1. So I wanna go through and I want to label each one of these lines.

(08:33):
So I’m gonna put a B here for the beginning. And on each one of these I’m gonna put an R because there are a recombination point and I’m gonna put an E on the very last one because it’s an N right above here. You can see that this has found four recombination points throughout this whole thing. So now that I’ve identified where those recombination points are, I want to label these recombination points with the mega bases and where they fit in. So now it’s time to go over to our graph here. Now we have by default four recombination points already said in here you can add or delete recombination points by just adding in new columns or deleting columns.

(09:23):
So I have four year combination points. That’s all that I need. So I’m gonna go through and I’m going to label what the mega base is in this line right here. And all I’m doing is I’m taking that mega bases right from this column right here. So this is 0.1, that’s where the start is. Then I got a combination of 4.41 at 19.1, one at 32.9, one at 50.2 and the end at 63. One of the things you’ll notice is as I’m typing in those numbers, little bars start showing up here and that’s because the distance between these all varies. And this column is a representation of where those are going to fit in. If I now highlight everything and I resize it all, then this is more of a relative size of how long each one of these segments are compared to the others.

(10:17):
Now this is in mega bases. This is similar to those graphs that you would get from GEDmatch when you’re downloading the little graphics. And that’s what we’re going to be recreating here. I’ve got the mega bases for each one of those recombination points labeled. The next thing I want to do is figure out how much centimorgans each one of those segments is. I’m going to putting the centimorgan’s information in this row right here. And so I wanna figure out how much centimorgans is between 0.1 and 4.4, how much between 4.4 and 19.1 and so on. Now, I can use the information over in this column to figure this out. So for instance, I can see already between 4.4 and 19.1 there are 33.50 centimorgans. So I’m just gonna put it in 33.5 right there. I can also see here that between 32.9 and 50.2 is 23.6, so I can put 23.6 and I can put 36.3 right here because that last line tells me that it is 30, it is 36.3.

(11:27):
Now, to fill in the other two lines, I need to look at some of the combinations. So for instance, between 0.1 and 32.9, so between 0.1 and 32.9. Right here it is 54.90 centimorgans. That doesn’t help me out so much because I have two blank spots right there and I know between 0.1 and 63 is 114.70 centimorgans. So again, that also doesn’t help me out cause I still have those two blank spots. So once you get to a point like this, you need to make a rough guesstimate. Now how much centimorgans do we have left? Well, there’s 114.7 minus 93.4. That’s going to be 21.3 centimorgans. I’m going to put in five centimorgans right here and I’m going to put in the remainder right there. So that would be 16.4, 16.3, sorry.

(12:31):
Now you don’t have to be exact and if you find out that it is different later on, then you can recalculate that. But for most purposes, just a simple estimate is going to be okay. We’ve got the centimorgans all labeled. We know where the recombination points are, the distance in mega bases. So now it’s time to make our color segment map. So the color segment map is basically that little graphic that you downloaded from GEDmatch if you’re using Steven Fox’s Excel spreadsheet or if you’re just doing it manually. In this case, we’re going to make the own graph based on the information that we have right here. So for each one of these lines and each of these segments, we’re gonna go in and put whether it is half matched or whether it is full matched. You’ll notice that everything is read to begin with indicating that it’s a no match.

(13:21):
So if we work from the top down, then we will be able to fill it all in. The one rule to remember right here is that if it is already green, don’t recolor it yellow, but if it is yellow, you can recolor it green. Let’s go through starting with the first line on AC right here. Between one and 32.9 is going to be green, so I’m just going to put an F for full on all of those. And then I go to A and it is half all the way across and BC is also half all the way across and AC is half all the way across. Now I’ve already colored these first three blocks greens, so I’m not gonna color them yellow. I will just color those last two blocks yellow. So I’ve already got this first portion done. So let’s go to the next portion.

(14:16):
BC from 4.4 to 19 is a full match and a B is also a full match there. And then BC again is a full match from 32 to 50 and A is from 50 to 63. So now I have my color map all made with our segment map colored. We’ve completed the first five steps of segment phasing. Now there’s three more steps which I’ll cover in another video, but if you’d like to learn more about visual phasing, you can look at these videos over here and be sure to subscribe to our channel. Make sure you click on the bell if you wanna be notified about upcoming episodes. Be sure to hit the button and leave a comment in the comment section below.