Decennial Conference on Genetic Genealogy – Sunday

I had to make a choice to get this out now and messy or later and edited… I think most people want it NOW so please don’t mind the typos. These are basically raw notes with no proofreading.

Dr. Michael Hammer presented Ancient and Modern DNA Update: How many ancestral populations for Europeans? Ancient DNA is the key to figuring out the historical processes that led up to the DNA of today. How do we know what the historical situation was that led us to the present? We can look at ancient DNA. Dr. Hammer shared a study that shows that genes in Europe may have come from a third source. There is a northeast Asian-related admixture in northern Europeans.

A study by Rahavan et al. (2013) sequenced the genome of a 24,000 year old Siberian individual, which was genetically similar to Native Americans and West Eurasians, but not close to East Asians. About two or three weeks ago a paper came out in Nature by Lazaridis and about 100 other authors, including Dr. Hammer. This was the first study to fully sequence genomes of Neolithic and Mesolithic Europeans. There were nine ancient genomes that were fully sequenced. The samples came from Sweden about 8,000 years old, Loschbour about 8,000 years old, and Stuttgart, about 7,000 years ago. SNP data was analyzed with >2,300 worldwide samples. There was a discontinuity of Europeans and Near Easterners in the PCA clusters. The genomic data from the ancient samples was projected onto a modern map. They form three clusters, Western Hunter-Gatherer (WHG) meta-population based on the genome from Loschbour, Luxembourg, Early European Farmer (EEF) meta-population based on the genome from Stuttgart, Germany, and Ancient North Eurasian (ANE) meta-population. The addition of ANE explains the European clinical pattern based on the genome Mal’ta, Siberia. They did formal statistical testing using an f₃statistic. These statistics pointed to more than two populations.

What does the new model of European ancestry based on autosomal DNA mean for the NRY and mtDNA? The Lazaridis study shows that all five mesolithic Y chromosomes belong to Haplogroup I. Three fell into I2a1b and none into the presently locally more common I1. Dr. Hammer reviewed the four sites that he shared in 2013. These were Neolithic sites in France, Spain, Germany, and also Otzi, the 5,000 year old Tyrolean ice man. At this time there was no R1b and a ton of G2a, which dominated. Statistical testing showed that these were statistically highly different populations and there is no doubt that this is completely different Y-chromsomally than today. We now have a lot of new information. There are 15 sites.

Dr. Hammer made a chart of halpogroup frequencies. In the mesolithic period, more than 80% were I. By Neolithic period, G was strongly in the majority. By the metal age, I was back in the form of I2.

The mitochondrial DNA has a large shift between mesolithic and neolithic but then not as much rapid change after that. aDNA evidence supports a recent spread of haplogroup R lineages in Europe.

The Bronze Age was the first use of metal specifically to create weapons. The sword, spear and shield were all invented in this period. In the Iron Age, the first true mass-production of metal tools and weapons began. This revolutionized both agriculture and warfare. From 1200 to 1000 BCE Celts most dominated iron-age warriors.

We will all be able to map our present DNA onto a tree where we can compare with ancient samples and learn about our relatives a few thousand years ago as we incorporate ancient data into the way we do genealogy.

Questions for Dr. Hammer:

What is the haplogroup of the Siberian boy? A: R*

How do you explain the evenness of the distribution of ANE because it was a late entry? A: Unheaval? That is a good question. There are various incursions maybe coming from a single source or multiple sources. There were multiple events and things were going in various directions into Europe. His best guess is that the existing distribution is the effect of a mixing process. People moving locally would also serve to mix things up. There would need to be simulations to model that.

Have you or will you write this summary and make it available to us? A: At your service. I would be happy to provide some slides.

How did E1b1b get into England? A: I don’t know the answer but there was one sample in Spain. Certainly Atlantic trade is one way it could have happened. There are other more complex histories that have not been captured yet. The models are based on testing reference populations and if you don’t have those populations right, the models don’t work.

Will FTDNA show myOrigins from Hammer time? A: Bennett asked if we would like to see them spend the time coding this and the audience said YES!

Could the Y-STR data be made available to start mapping ancient lineages and finding how they match to individuals based on STRs? A: The closer we get to having the ancient and modern samples to shake hands, eventually the genealogies will start to match up better. This is the future.

Bennett asked Dr. Hammer to explain why they can get ancient samples when we can’t get it from a stamp. A: It is about expense somewhat but also, they are getting DNA from a tooth or a bone and not just someone licking a stamp.

Max introduced group administrator Robert Baber. Robert is an electrical engineer with a doctorate in computer science. He also taught in different places around the world including the US, South Africa, and Germany, and did research in Canada and Ireland. Robert talked about calculating the distance of members of his Y-DNA project.

Robert talked about estimating how closely people are related, verifying that a given or conjectured ancestral tree is consistent with Y-DNA test results, deducing ancestors’ Y-DNA marker values, mutations and mutation graphs, and deriving an ancestral tree from Y-DNA only.

The Baber project members in haplogroup E were born in England and USA and had English Ancestry. The project also has members in Haplogroup R, born in the US with presumed English ancestry but not known. These members with R haplogroups were in 3 subgroups not clearly related in family name time. Robert went through the process of testing 111 markers for different members to find out if the individual marker values could be used to determine how one of the ancestors connected with the rest of the tree. Robert was then in contact with a man who he had hoped would test. He had a hard time getting him to test and then suddenly the man tested. We all know what a joyous occasion that is!

When this new member’s results came, there were duplicate mutations along two of the lines in order to get to the common ancestor, which indicates that the mutation occurred one generation up and that wouldn’t work with the existing mutations up that line. In light of this, Robert retained the ancestral tree as correct but did not have DNA support.

In order to determine the connections and verify the consistency of the resulting tree with the Y-DNA results, Robert created a mutation graph. A mutation graph is a graph connecting several haplotypes with each other and does not indicate the direction of ancestral relationships. It can be derived from a marker table only. You don’t get names and dates but you get positions in the ancestral tree. A mutation graph has one node for each haplotype. To convert a mutation graph to an ancestral tree, you have to know where the top, or common ancestor, is. Some nodes may contain tested people and their ancestors. You an insert the haplotype of a more distant ancestor, e.g. the mode of other members of the same subclade but with other surnames. Robert took others from the FTDNA Group Projects with the same haplotype to derive the modal haplotype for the mutation graph. This allowed him to confirm or replace his initial conjectures.

Robert’s conclusions are that TiP reports are based on estimated mutation rates and particular statistical assumptions and are therefore correspondingly imprecise. The analyses based on detailed marker values do not depend on estimated mutation rates or statistical assumptions and lead to more precise, detailed conclusions, and are useful for deducing the structure of an ancestral tree.

Robert recommends to use TiP reports as a useful initial screening but don’t rely on them to any further degree. Always verify your tree. Verify that the number of mutations is the minimum. Derive a mutation graph, especially when in doubt about any structural aspect of a tree. Don’t overlook relevant data from pre-family name times or on other surnames. If Robert had constructed a mutation graph at the beginning, he would have discovered that his first conjecture was wrong and he would have discovered two additional ancestors before the additional member joined the group.

Questions for Robert:

Regarding 14 generations, did you consider a back mutation factor? A: If there was a mutation and its exact reversal, that would be the probability of .002².

How do you go from marker table to mutation graph? A: See photo

Chief Scientific Officer David Mittelman, PhD, presented Big Y: One Year Later. This is the tenth anniversary of NGS. Illumina sequencing is powerful. You can do Sanger-like sequencing in parallel, doing millions at a time. It is powerful and cheap but analysis is much more complex. The cost has levelled off. One of the biggest complexities is that the reference genome is not perfect. New refinements gradually improve reliability. FTDNA invests heavily in genome analysis methods including 1000 Genomes, Genome in a Bottle Consortium, and Global Alliance for Genomics and Health. FTDNA built GCAT, which compares genome analysis tools. Three medical schools have used it. It can be found at www.bioplanet.com/gcat.

Big Y was announced in November 2013 and the first results were received in February 2014. Thousands have been ordered. Academic research groups are now ordering Big Y in sets of hundreds of samples to advance research into human history. That data will be available as some, if not all, publish. FTDNA has published a white paper in the learning center. Big Y started with a ten week turnaround. Now it is as low as 5-6 weeks but on average, about 8 weeks. They are continually working to streamline the process. David mentioned the forthcoming paper by Oleg Balanovsky, “Deep Phylogenetic Analysis of the Y Chromsomal Haplogroup G1 reveals migrations of Iramic speakers between South-West Asia and the Eurasian Steppes.”

The genealogy community is definitely engaged. David reminded everyone that as Blaine said yesterday, DNA data can identify you. However, if you would like to contribute, FTDNA will help. For those who consented to contribute results to NCBI, there has been a second, more detailed, consent sent out after the first one.

Alice Fairhurst administers 19 projects! Way to go, Alice!

Alice talked about the ISOGG tree. There have been more than 2 million hits in the past six months. The most viewed pages are the SNP Index and Haplogroups R, I, E, N, J, and G. They have been viewed from all continents and by universities and genetic laboratories. This year 10,213 new SNPs have been added!

As far as new developments, recent SNP info from many more labs has resulted in multiple names for the same SNP. An experimental tree by Ray Banks has been added to many haplogroup pages so that people can project ahead. More Geno 2.0 SNPs were placed on the ISOGG tree and they are starting to put Big Y SNPs on the tree. Alice expressed that she was literally paralyzed by Big Y and it took a lot of time by a lot of people to figure this out. Four new people were added to R just in the last week. R1b-U106 went from 61 subclades to 243 subclades after Big Y!

Michael Gugel Director of Product for Family Tree DNA, gave the Products Update. FTDNA is striving for more users, which will give us more project members. They are also looking to provide more answers. Along with Bennett’s idea, they hope to have a census rather than a survey. There is a boulder in the road to getting people to test and that boulder is money. There are a bunch of creative ways to work around this. Right now transfers are $69 and you can transfer from Ancestry or the 23andMe v3 chip. They are going to give people an opportunity to transfer 23andMe and Ancestry results for free but there is a caveat. They will show the people their top 20 matches for free. They will not be able to contact those matches yet. They will only see the first initial and the last name. Once you see your top 20 matches you have a decision point. Option 1 is $39 to unlock all matches and get their email addresses. Option 2 is to recruit four people to transfer for free and you get unlocked for free. They will also have a free user experience. They can create a family tree. This will get people in the door and educate them on the merits of genetic genealogy. One of the key reasons they implemented the family tree was that you can create a tree from scratch rather than having to upload a tree. The new ability will be to search all public entries.

To facilitate communication, a new launch of MyGroups alpha will be released to select people today. One of the key problems is that conversations are not reaching all of the group because outside groups like Facebook are being used to communicate. These groups are active but only reaching a subset. Michael compared bulk email to a sword and sometimes you need a scalpel. Problem #3 is that myfamily.com shut down and tens of thousands of users were left in the dust.

The group administrator page will not change. There will be “MyGroups” for projects. These look very similar to other social networks. Users will be able to post pictures, stories, and questions. There will be a record and an open dialog between all members. Group administrators will be able to moderate and organize all members. Five brave volunteers with 20 or more members will be selected to give feedback over the next few weeks. Sign up at http://tinyurl.com/ftdnagroups.

Questions for Michael:

For 23andMe uploads, will they appear in our match list if they do not pay? A: Not yet.

Must people join a project to be included in a group? A: Yes.

Who will rights to the intellectual property posted on group pages? A: Good question. He doesn’t have the answer yet. He will think about this more and will have a firm answer for release.

Jason Wang, the Chief Technology Officer, gave the Tech Update. In the past year, they have doubled the size of the software team and it is now about 20 people. They come from all industries and walks of life. Some are from gaming, oil & gas, and healthcare. They all share a passion for the work that FTDNA does and want to make it the best product that it can be.

Their goal is to give you the ultimate experience at FamilyTreeDNA.com has some fun technical challenges. They have to be able to handle rapid growth. They surpassed one million testers and that number is growing rapidly. Matching calculations grow exponentially. There is a lot of engineering to get ahead of the curve with this increasing user base. They aim to keep the website online 99.9% of the time and store enormous amounts of data with fast loading pages. They hope to get that number up to 99.999 so that they don’t get phone calls in the middle of the night. Security and privacy, protection from spammers, hackers, and scrapers, are major issues.

A lot of Jason’s focus for the past year has been horizontal scaling. In traditional IT days, the way to add more power was to add more to the server to make a super server. The problem is, you eventually reach limits of what you can put in a single computer. Horizontal scaling has a number of medium good machines, called commodity hardware. This has been a big area of focus and it is used in technologies for autosomal matching, X matches, myOrigins, and Big Y. They used horizontal scaling and cloud technology to crunch the x data within hours instead of months. They used 1,000 cloud machines and to do this, you just pay for the time that you are using the servers. They are expanding storage capacity and have upgraded in house storage to 450TB, expandable to 1.5PB. They have new cloud storage of 150TB with unlimited expansion. Data is replicated across multiple devices for redundancy.

Security is a major focus. They have improved their security algorithms so that they are virtually unbreakable. They are using SSL secure checkout, and never store your credit card info. There are three levels of security – network, server, and application level, with 24/7 monitoring and alerts. They also use de-identification and signed URLs for sensitive data.

Reliability is king. The production data center has 99.9% uptime with redundant power and cooling systems, 24/7/365 monitored video surveillance and redundant internet with multiple tier 1 providers. In the office they have battery power backup, an on-site power generator, data redundancy, they monitor everything. They also have automated testing + a manual QA process and fault tolerant applications. They are trying to push out code that is as bug free as possible.

FTDNA has invested in upgrading hardware. They have reduced database server load from 95% to 10% average. They have also implemented fully automatic failover clustering. They are working on implementing multiple caching levels. Jason is very excited about Content Delivery Network. There are a lot of images and assets on the site and the Content Delivery Network distributes it across 50 locations in 5 continents to improve download speeds around the world.

Questions for Jason:

Please explain what personal information will be released if I pay $39? A: The information that is released depends on your personal privacy settings. By default, they are more conservative. It is fully customizable.

For login security, are there plans for a two-step verification? A: Max asked us to raise our hands and the attendees clearly wanted no second level verification for login.

FTDNA Lab Manager Connie Bormans gave the Lab Update. Over the past year, the lab has been extremely busy. They’ve had an increase in lab personnel, automation and equipment, an increase in test offerings, and an increase in throughput. They can process more samples with less people. They are actively working to decrease turnaround times.

In the past year, they have added six additional members to the lab. Last year there were about 15 and now it’s over 20. With that growth, they’ve tried to make the lab more efficient. They’ve now divided into two departments. Production is responsible for all current tests and research and development is responsible for new technologies and new assay development. The production group includes the microarray group, NGS, Sequencing, STR, mtDNA group, and the data analysis group. The data analysis group is very meticulous and can sit for long periods of time to look at results.

In the past year, they have expanded the layout of the lab to almost double the physical size. They have added new office space so that all lab personnel are in one place. They have also added two new robots for increased throughput and automation for Big Y Sample Prep, DNA Extraction, and Y STRs and SNPs.

In the last year, many new things have happened in the lab. They have designed and released over 2300 new SNPs. They have increased DNA extraction capacity to 1600 samples per day and now have over 300k samples extracted and stored. They have doubled mtDNA capacity to process them twice as fast. QC checks have been added to predict success of NGS tests. They have introduced new M222 deep clade test and hope to expand to new haplogroups with group administrator help. This is the fifth year of maintaining accreditations by outside agencies. They currently hold three individual certifications.

In terms of NGS testing, the DNA requirements are far more stringent than any other tests. There is a multi-step process for processing samples. Some factors are the age of sample, quality of extracted DNA, the quality of product after the first step of the NGS process, and the coverage of the sample after the run. They are testing new equipment and technology to improve the pass rates and throughput of these samples.

Connie thanked all of the members of the lab who work so hard for continual process improvement. They welcome feedback. They hear them and they are working on them.

Questions for Connie:

How about a big shoutout for the Customer Service folks? Introduce them! They are the day to day of FTDNA! A: Yes! I love you all! They have done so much.

When a new sample is required and you used the last vial, do you contact the person? My father received a new kit and not an email. He thought it was a security scam. A: If we notice there is a little left and it’s on the border, we will process the test but just in case, we send a new kit out. In this case, there is no notification because it has not technically failed. They can implement the notification when they mail out the kit.

I am a custodian for DNA samples for a deceased person. Can we find out if what you have on file is enough for a certain test? A: Yes, we can do some initial QC. We can’t give a definitive but we can tell you if it is definitely not good.

About how many years until you expect FTDNA to offer full genome sequence for ancestry? A: Bennett said that depends on technology. As the manufacturers offer longer read lengths and technology improves, it will be better. If completeness and neatness is your goal, you would be better to wait since the current full genome is a misnomer.

Max recognized Michelle, who manages customer service. Michelle then recognized Morgan, the support supervisor, and the rest of the support team. They also recognized Janine.

Q&A by Bennett:

What will the process be to determine what SNPs will be in the deep clade panels? A: We are hoping that the community will do that. Those who are intimately involved probably have a better idea of which SNPs they want. We want to do an excalibur job working in conjunction with ISOGG to vet the tree and work collaboratively. Barring non-optimal SNPs, they would like to put everything else in the panel. It is projected to take about two weeks to develop a panel. They will soon be doing two panels at a time. They will then make a decision to call on the community to order the test to prove the position of the SNPs and then make them publicly available for release. He cannot predict whether it will be 1, 2, or 8 panels per month. His guess is that they will try to reorganize people in the lab so that when there is a curated, completed sample set they will go through it. The goal is to do multiple panels in one month. He does not want to come back next year to hear that there is a panel that someone wants that they’ve been waiting months for.

Can those of us here at the conference collect a new sample to be stored for use? A: There is no answer at this time. He read the question to address it to the employees. If you call the lab you can find out if you have an unextracted vial in the storage room. If you don’t have an extra vial and the test has only been run once, there is probably plenty of DNA left. If you are concerned or you have concern about a parent who is still alive, just explain your concern and the issue to the customer service department and it can easily be made available for collection. The first thing to do for efficiency standpoint is to make sure whether they do or don’t have more sample in house.

After a short break, there was a Q&A Period.

Can you construct a general R1b panel in addition to a specific panel? A: Bennett said no he cannot construct a general R1b panel because it would be $300-$400 panels due to so many SNPs but they will take a look at the data for the suggested panels and see how they cover the map of the tree and develop a very small panel of 8-10 SNPs that will allow us to know what subclade an individual is in. They don’t know how they will deal with that at this point. It will be a biological, statistical, and mathematical issue.

Please explain what admins will do to manage/moderate MyGroups? A: Michael said nothing. You can or you don’t have to.

Must people join a project to be included in MyGroups? A: Yes for now but maybe not in the future.

Will admins control who can post? A: No, whoever is in the group can post. All posts will be stored.

Regarding 23andMe format, what format do you accept? A: You can download the raw data or you can extract it.

Will it be possible to import 23andMe v4 chips? Michael said yes but Bennett says to be able to import a v4 chip, they have to use imputation. What they would have to do is a lot of math to see how good imputation would work. If it won’t be good, it would be the same as uploading a file with a weak call rate. While it is technically possible, it might be a lot more challenging than it is actually worth. When resources are freed up, they will look more into this. It will not be trivial to do.

For the $39 transfer, will we get full access? A: Yes. It’s the same kind of thing you see when you purchase Family Finder.

If you have a gedcom on the site, does it need to be re-uploaded? A: No

Our chromosome browser should be able to search for all matches on a specific segment. Is that possible? A: Bennett says this is high on the visibility list.

Will 23andMe uploads appear in our match list if they don’t upgrade? A: No, not until they recruit four friends or family members or pay the $39 fee.

Regarding Big Y, what are calls? Jason said calls are variant calls. There is a reference that is an average of white European males. A variant is a difference from this. A call is a place where it different.

What causes poor quality DNA? A: Connie said the single most important factor is the enthusiasm with which someone scrapes. The harder you scrape, the more they get. The best ones are red.

My project member was told that her sample was inconclusive. The FAQ says she has to pay $50 for a new lab test. Is this true? A: Inconclusive means the call rate was below 97%. If you have a DNA sequence of 100 bases and you only have 75, you can’t say what the whole sequence is. Since there are over 700,000 SNPs, they need over 97% to have data or it effects the data and is inconclusive. The single most important reason is the DNA quality. After they run it twice, they lose money. Bennett added that they followed 23andMe on that. They looked to see what competitors are doing. The buccal swab produces the highest call percentage of any modality other than blood.

To whom do I address a “use my dna for anything” email? A: David@familytreedna.com

When is the new standard haplotree coming out? A: They are now planning to attack each relevant subclade one at a time with the new deep clade technology. They anticipate those coming out on a regular basis starting in a couple of weeks.

When can we see a match of a match? A: Bennett said it’s a privacy issue and they don’t want any more lawsuits. Max said if he has to vote on that, he will say no. This opens the door to data mining.

I ordered Big Y at the conference. A year later my terminal SNP still shows as R-L21 on the badge on the personal page. Can I expect to see an update on a personal SNP? A: Absolutely, yes. It will get on the upgrade list. It should and will be done.

Will the age of the sample impact the deep clade results? A: The deep clade is not NGS so the age of the sample should have less of an effect. They will work with some low grade samples to define limits for the assay. Out of 96 run for the M222 test, all passed. These included all prior extracted samples from the frost-free freezer. They ranged from a few months to several years.

Can you download and compare results for your full mtDNA sequence? A: They have that already.

When is a SNP terminal? A: It will never be terminal until we know the entire evolutionary state of mankind.

Why do men have far less in terms of X chromosome matching than women? A: It makes sense. The reason is because women have two X chromosomes. There are heterozygote values that are considered wildcards by the calculations. This is why she will have, in general, a large number of matches. For a male, you know they all came from the maternal line.

As academic research are published on Big Y results, will those results be part of the FTDNA Haplotree? A: Jason said yes. There is some computational work on the back end but their goal is to be able to add these new SNPs and have them show up on the match page.

Max said that it is extremely important to have feedback regarding the things we’ve heard this weekend and the things they’re working on. Please be sure to give feedback at www.familytreedna.com/suggestions.

Bennett was asked to share one or two of his biggest surprises over the last decade. One is that he’s still here after all of these years. Another surprise is how responsive the general citizenry has been and how receptive to something new. It shows that you are never too old to learn something new. He knows that he is talking science and biology to non-science majors. People signed up to create a family tree and now learn about biology and point mutations and STRs. One of the biggest surprises is how well the community has done coming forward to deal with this. One of his greatest pleasures has been the opportunity to meet virtually or in person with so many people and the biggest pain point has been to see so many people over the years pass away. That is the hardest thing. Bennett and Max have had the chance to embrace people from the community and from the bottom of his heart, the last 14 years has been the climax of his business and personal career.

Max said this all sounds like a retirement speech and this is NOT a retirement speech.

With that, Bennett thanked everyone and looks forward to seeing everyone in the future.

Decennial Conference on Genetic Genealogy – Sunday

Like this:

Categories

Pages

Decennial Conference on Genetic Genealogy – Sunday

Share this:

Like this:

Categories

Pages

Tags