Max Blankfeld opened the conference with a welcome. Max talked about competition making for healthy growth. This has been the best year ever and there are great things in the pipeline.
In June of this year, what started as an acquisition became something that could be a guarantee of continuity for Family Tree DNA. Three new people have been added to staff. Nir Leibovich is the Chief Business Officer, David Mittelman is the Chief Scientific Officer, and Jason Wang is the Chief Technology Officer leading the Engineering Department.
Max got a little bit emotional as he talked about people who have been great supporters of Family Tree DNA who are no longer with us. He stressed that he really appreciates what they brought to the company and most importantly, their friendship. He also sent best wishes to Bill Hurst for a speedy recovery.
Bennett stepped up to join Max to talk about group administrators who have been with them for 10 years or more. Each of these group administrators will receive a plaque to honor their years of commitment.
Bennett talked a bit about the lab tours. Approximately 60 people in total have or will attend lab tours during this conference period. Bennett shared the news about the CAP certification. This allows the company to justify the acquisition of some expensive instruments that will be beneficial to the genealogical community.
The first speaker of the day was Amy McGuire, PhD, JD, of the Center for Medical Ethics and Health Policy at Baylor College of Medicine, presenting Am I My Brother’s Keeper? DNA Identifiability and Obligations to Biological Relatives in Genetic Genealogy. Amy approached the talk from a policy and ethics perspective. She discussed the question of what obligations are owed to biological relatives in all areas of genetics. These may be different in different contexts, such as research, clinical, and DTC. The three major issues are DNA identifiability, data sharing and privacy, and obligations to biological relatives.
Amy shared the article Identifying Personal Genomes by Surname Inference that was published in Science earlier this year by a group from MIT. Amy worked with them on the ethical issues of what they had discovered. They were able to take the 1,000 Genomes research project and link it to genetic genealogy databases and link it to nearly 50 people in the database. The method was published but not the names of the participants. There have been other studies since 2004 that also indicate this is possible. In 2004 a group from Stamford published Genomic Research and Human Subject Privacy, finding that they needed just 30-60 significant SNP positions for identification. In 2008 David Craig’s group published Resolving Individuals Contributing Trace Amounts of DNA to Highly Complex Mixtures Using High-Density SNP Genotyping Microarrays. This study determined that you could identify an individual from a pool.
The MIT study was different because it was the first to identify individuals using just the internet, without need for reference sample data. The data providing the surname link may have been provided by a person unknown to the participant. This could have been a distant relative on the paternal side.
This research was inspired by a story from the Washington Post in 2005, Found on the Web, With DNA: a Boy’s Father. The boy used the Family Tree DNA database to find a surname match for his sperm donor. The boy had enough information about his biological father’s information to identify a couple of people in that region who could be a match and then identify his father.
Individuals vary in how concerned they are about their privacy, being identified, their genetic privacy, and having their genetic information revealed. People tend to vary widely on their opinions. Some people put it all out there and some people are extremely concerned. Amy feels that we should respect individual opinion and be sure that people are informed.
Dr. McGuire spoke on eugenics. There was a very famous case of Buck v. Bell (U.S. 1927) that affirmed the constitutionality of Virginia’s sterilization law in the case of a third generation institutionalized woman. Eugenics ideology was deeply embedded in the U.S. pop culture in the 1920s – 1930s. Hitler closely followed U.S. eugenic legislation. His policies were very informed by what was going on in the United States.
The field of genomics has completely exploded in the past decade. In 2001, the Human Genome Project cost $2.7B. By 2007 the 454 Life Sciences machine was used to sequence the first individual genome of Jim Watson for $1M in about one month. In 2009, Complete Genomics reported that they could sequence a complete genome for $4,400. Last summer Ion Torrent said they could sequence an entire gene for $1,000 in one day and Nanopore put out the USB Sequencer.
The traditional means of protecting privacy was deidentification. Once studies started coming out in 2005 and beyond, this was not the case. There was an NIH policy response in the research context to shift to controlled access databases.
There are still some Open Access Research Databases:
- Human Genome Project
- HapMap
- 1000 Genomes
- Personal Genomes (PGP, Venter, Watson)
- Human Microbiome Project
A Baylor study gave three consent options during genetic studies: open access, controlled access database, or no data sharing. After debriefing, 53.1% said to release to open access, 33.1% chose restricted access, and 13.7 were in favor of no release. All of the participants were highly motivated because they were patients or family members who trusted their physicians. Some other populations have much lower results for sharing. The majority (86%) of participants reported is it important or extremely important for them to be involved in the decision about whether to share their genetic information. They expressed that they wanted to be respected.
From an ethical and policy perspective we need to be concerned about family members. James Watson wanted his whole genome put out on the internet without his APOE status, the gene for Alzheimer’s disease, because his grandmother died from Alzheimer’s. Jim Watson did not want the scientists to contact his sons so the scientists did not want to release the study and it almost caused the study to halt. They ended up compromising by giving him his data to do with as he chose.
The issues are that you must balance the autonomy of the individual with the interests of biological relatives. Some research ethics statements encourage inclusion of relevant family members in the consent process.
Earlier this year, two full genome sequences were done on the Henrietta Lacks HeLa cell line. This is very controversial. Rebecca Skloot responded with an article in the NY Times to point out that no one got consent before publication. The line could now be identified and this was not smart. The sequence data was pulled back and ultimately, they negotiated with the family to put the data in a restricted database with two family members on the committee to make decisions about its use.
At a policy level, this is messy. How far do we extend the people whose permission we ask?
Miguel Vilar presented Geno 2.0 Update and Y-2014 Tree. Miguel, a molecular anthropologist, is the Scientific Manager who works with Spencer Wells at the Genographic Project in Washington D.C. Dr. Vilar talked about Legacy projects and shared photos of cultural and language preservation projects. They’ve given 75 grants since 2005 totaling more $2 million on five continents. In 2013, there have been 96 applications, nearly triple the average!
In the twelve genographic research centers spanning six continents, the Genographic Project has obtained 72,000 indigenous samples between 2005-2012.
Genographic has published 42 manuscripts and presented at more than 80 professional conferences, both nationally and internationally. The most recent paper on the mtDNA of central Europe can be found online here.
Seven scientific grants have been given since the launch of the Scientific Grant program in 2012. These include The Peopling of the Caribbean and the Origin and Spread of Indo-European, among others.
The Genographic website has been a great success. They are also working with Facebook, Twitter, and a Genographic Blog.
Public participation in Geno 1.0 between 2005-2011 was about 500,000 participants. For mitochondrial DNA they tested the control region and 8 SNPs. For Y-chromosome, they tested Y-STRs and 17 SNPs. Geno 2.0 was started in 2012 and so far, as of November 2013, has about 80,000 participants.
So what is the Geno Chip? It’s ~150,000 SNP array made by Illumina with 130,000 non-medical autosomal SNPs, 3,000 mt SNPs, and 17,000 Y-SNPs. The results will show deep ancestry, recent ancestry, and the hominid story.
People have been looking at mtDNA since the 1980s In the late 1990s the community adopted letter nomenclature. Mitochondrial DNA still shows the same inner tree and the same haplogroups but current genomes have given us bushy branch tips.
By the mid-2000s scientists thought they were done with the Y-DNA tree. Geno 1.0 looked at a few Y-DNA STRs but not so many SNPs. The field went back to the trees to look at more SNPS. Some branches of the tree are 10 times larger. People are working together to make sense of it.
Dr. Vilar showed a photo of Haploggroup Q1a1a from the 2014 tree. Results showed that 60% of the mitochondrial DNA of Puerto Rico was indigenous and 0% percent of the Y-DNA was indigenous. Wow. The Spanish really left their mark.
In the consumer genetics trend, expect public participation to double by next year. They are working on a new chip to come out next year.
Matt Dexter presented Autosomal Analysis. He has been an IT professional for more than 20 years. His interest in DNA testing started as an adoptee. Matt found out when he was nine years old at the doctor’s office that he was adopted by a family arrangement. He didn’t look for his birth mother until 2008 and then he was able to find her and meet her. She said, “Well, I know who your dad is. I just don’t remember his name.” Matt found out that his father’s name was Dale St. Clair but that turned out not to be his real name. Matt had to be creative to find him and when he found him he was excited but their Y-DNA testing showed that he was not the father. His mother said there was another guy but she’d told him that Dale was the father.
Matt tested himself, his mother, his wife, his wife’s parents, his five children, and his four grandchildren, among other people.
Matt likes to think of the autosomal test as two tests, one for each side. Matt explained that we start as diploid zygotes that start from haploid gametes.
Matt talked about whether two people are related if they overlap. The answer depends on whether they connect on your maternal or paternal side. Even if segments overlap, if they don’t match each other, they are not related to each other. If they overlap and they are connected to the same parent, then they are related to each other. Matt showed an example where the parent and child matched to the grandchild. Some matches can be a match from both the maternal and the paternal side. You have to look at other relatives and other information to find out which person is mapping to that chromosome.
Matt gave an explanation of inheritance. He showed a creative example of pouring orange juice from two glasses into one glass. Each of the original glasses still had 50% still there that was not contributed. If you compare your data to each side, that is called phasing your data.
Matt pointed out that sometime you find a lot of matches because one branch had many children and another reason could be variation. Children inherit 50% from each parent but when it gets to the grandparent level, which 50% is passed on is not necessarily a clear division from each grandparent.
Additionally, not all lines mix at crossover. There are times when the entire chromosome can pass without recombination. It is also normal for this to happen on the X chromosome. Crossover is not necessarily proportional, either.
Since I have been interested in hotspots of recombination on the chromosome, I was happy to see Matt’s examples of the crossover events at the expected chromosomal hotspots. It works! It really works. It’s always nice when that happens.
Matt did a nice job explaining a subject that can be challenging to many who don’t have a lot of autosomal experience. I am sure this was perhaps a bit complicated for the beginner but I’m sure there were many in the audience who understood what he was talking about. His wonderful bank of family tests has allowed him to provide examples that many of us are not so lucky to have.
I attended the breakout session with Debbie Parker Wayne on mtDNA Tools and Techniques. Debbie reminded us that DNA is just one piece of evidence. Thorough or a “reasonably exhaustive” genetic search may involve testing multiple people. Who and how many people are tested depends on the research problem. Debbie always suggests testing as many people as you can as soon as you can. You never know when you will lose someone. She reviewed a chart of mtDNA inheritance and showed how to select which people in a family group to select for mtDNA testing to achieve the desired results and explained HVR1, HVR2, and Full Mitochondrial Sequence. Debbie talked about the Cambridge Reference Sequence (CRS) and Revised Cambridge Reference Sequence (rCRS) and the importance of ensuring which one is being used.
The RSRS, or Reconstructed Sapiens Reference Sequence was published by Doran Behar. Sometimes the RSRS inserts “N” as needed to retain rCRS position numbering. They changed some haplogroup names, but tried to keep changes to a minimum since lots of published literature uses the old names. There was some question about whether RSRS would become widely accepted in the scientific community. Behar published the paper in 2012 and 16 months later it had been cited 32 times in Google Scholar. There were comments that were positive, some that were cautionary, and some suggested improvements.
Debbie noted that she had checked with Bennett and as of last week, there are 30,781 Full Mitochondrial Sequence results in the FTDNA database.
Since I’ve already seen Tim Janzen’s Autosomal Mapping presentation and mostly because Roberta Estes is so awesome, I chose to attend her presentation How to Find Your Indian Prince(ss) Without Having to Kiss Too Many Frogs. Roberta says that where there is smoke, there may be fire. It may not be what you think or where you think because stories lose “facts” with each generational retelling.
Historically, “Indians” and “Blacks” were viewed as one because they were people of color. Once held in slavery, freedom had to be approved by the state governing body. Manumitted slaves, Indian, black, or mixed people were second class citizens. They could not own land, vote, testify against whites and were taxed at a higher rate. Tribal Indians were often invisible because they were not taxed. Treaties led to Indian reservations and group land ownership. Until 1835, some Indians and mixed race people could vote, go to school and testify in court.
European expansion led to Indians being pushed off of land. Treaties were broken. Indians were assimilated and removed. The saying “the only good Indian is a dead Indian” was a real thing. In 1830 Andrew Jackson signed the Indian Removal Act forcing removal west of the Mississippi. Indians enslaved were admixed and the same status as other slaves. Disease and assimilation had decimated the eastern tribes. Mixed race people were often, but not always, considered to be “of color”. Tribes lived on reservations and did not pay taxes.
During removal, only Indians on reservations had to go. Mixed race people not on reservations did not have to leave. Indians living on private land did not have to leave. Rolls were taken, but not everyone participated. Cherokees believed the Treaty of Echota was illegal because it was not ratified by the tribe. Being “Indian” was not a good thing in the 1830s and fear was rampant. In 1838 after the Trail of Tears, some of the survivors NEVER spoke again.
Assimilation began as early as Jamestown in 1614 with Pocahontas. Her name was Matoaka and she was baptized and renamed Rebecca Rolfe. Bride ships didn’t arrive until 1620. Assimilation started with slavery. Traders had “Country Wives” in every village that made trading easier. It was a status symbol for females because they got English goods. The Native custom was that when a traveler came, they provided them with a bed mate for the night. This was usually one of the young women who are referred to as “Trader girls”. There was also assimilation with the trader’s slaves. People were kidnapped all the time. This kidnapping went on between tribes, slaves, and whites. Captives were often adopted. There were also traditional marriages and generally the female was Native and the male was European. Roberta does not know of any that were the other way around.
In 1819, one third of white Cherokee tribal members reported to be white females captured as children. In 1835 removal rolls, 211 whites were recorded as “married in” and another 100 were living with the Cherokee. In the 1835 removal rolls, 23% of the tribe was listed as “admixed”.
Indian Princess Grandmas might be admixed through slavery. They might be admixed through assimilation prior to the 1830s or through tribal assimilation. They might have been tribal members in the 1830s. There are rolls from 1830s forward.
The thing is, she might be African and not Native but this doesn’t go over very well. Roberta says that she was even referred to as “enemies” after publication of that information. Geography is a clue if you can figure out who was really “Indian”.
Roberta advises to revisit the family history. Try to discern which line the Indian Grandma was on. DNA testing can help with this. Both mitochondrial and Y-line DNA can give you haplogroups that could be Native. Roberta suggests to “Data Mine for Ancestors”.
The Native American mtDNA haplogroups include A, B, C, D and X. These are also found in Asian and European but you can find a detailed list on Roberta’s blog. Some haplogroups and subgroups are not yet proven. X2b and A3, A4a, and A4a1 are currently being looked at. They are also still working on an A2 project.
Native Y-Line haplogroups include subsets of haplogroups Q and C. C3b – P39 is found heavily in Canada. Most M3 and several others are showing as Native. If you carry D9S919 you can prove you carry Native population. Not carrying it does not exclude, though. About 30% of Natives carry the value of 9.
As far as Ethnicity Tests on her Native side, Family Finder found 0% Native, 23and Me 1%, Ancestry 1%, DeCode 5% and 6% on X, and Doug McDonald estimated 1-3%. The rough line is 5-6 generations or born around 1800 shows as 1%.
If these tests don’t show what you need, start kissing some frogs. Move to third-party tools. Gedmatch allows you to match smaller segments and paint chromosomes using ethnicity tools and compare with others. Roberta did a full series on her blog called “The Autosomal Me”. This is painfully detailed instructions. Read first before you start! There might be gems in there but you might not want to do the whole thing.
Roberta designed something called minority admixture mapping which looks for your personal minority population. Her largest native segments were on Chromosomes 1 and 2 and did chromosome painting on Gedmatch.
David Mittelman, the Chief Scientific Officer of Gene by Gene and Family Tree DNA, presented “Next-generation Sequencing at Family Tree DNA”. David started by addressing the issue of privacy, which evolves with technology. It is important to update our consenting policies.
David said that the state of the art lab is hoping to double in size next year and bring in more instruments.
The Sanger sequencing that we used for the first 30 years was very slow. In the lab, they are now moving to the Next-generation Sequencing through Illuma products. Illumina made two innovations. They got rid of the terminator and use the reversible terminator. You can sequence your two copies separately. This new equipment can sequence your whole genome in a day.
The cost per genome has dropped significantly so getting data is not that hard but how do we put all that data back together? There are lots of platforms and sequencing applications. They all have their own errors and biases. There are even more ways to analyze the data.
At Arpeggi, they built technology to take the information off the machine and put it back together. There is no point to sequence tons of information all at once unless you can assemble it all at once. Jason Wang was able to analyze all of the individual fragments separately. He processed all of the data in parallel. He can get massive amounts of data put into position in very little time.
Sequencers are already getting tiny. There is a new sequencer that is the size of a mouse.
At Arpeggi they built a platform that will put the information back together not only quickly, but accurately. A tool can be found online at www.bioplanet.com/gcat. They also teamed up with the National Institute of Standards and Measurements and using a truth set, they were able to start ranking tools.
What does this mean for genealogy and YOU?
It mean’s a lot! If you give them six months they’ll come up with even more ideas but for today, there is a case study: Querying SNPs on the Y chromosome. Y-SNP testing from the YCC2010 tree or user requests are done for $39 per SNP. NatGeo Array-based Y-SNP testing that covers 10-12k SNPs for $100. WTY testing that covers 300,000 Y positions was $950 to $1500 but this is no longer offered.
With next-gen sequencing, can we reconstruct a big chunk of the Y? How much can we get? They did experiments with their friends at Illumina and got 10 million positions on the Y chromosome with nearly 25,000 known SNPs.
Available TODAY: BIG Y DNA Test Next Gen Y Sequencing. This is comprehensive sequencing of the Y. Existing Y-DNA customers can order today and get a $200 discount off of the regular price of $695 for $495.
Sorry for the bad grammar but I’m getting tired. Also, the punctuation inside the quotation marks is on purpose. I will not conform to mindlessly placing commas inside my quotation marks. More tomorrow!