Genomic Research and Sibling Identifiability
Author Information
Author(s): Cassa Christopher A, Schmidt Brian, Kohane Isaac S, Mandl Kenneth D
Primary Institution: Children's Hospital Informatics Program at the Harvard-MIT Division of Health Sciences and Technology
Hypothesis
How much familial information can be inferred from genomic data, particularly regarding siblings?
Conclusion
Substantial discrimination and privacy risks arise from the use of inferred familial genomic data.
Supporting Evidence
- Sibling SNP genotypes can be inferred with substantial accuracy.
- A very low number of matches at commonly varying SNPs is sufficient to confirm sib-ship.
- Using HapMap trio data, we achieved 91.9% inference accuracy for sibling genotypes.
Takeaway
This study shows that we can guess a sibling's genetic information from another sibling's DNA, which can lead to privacy issues.
Methodology
The study used a framework to measure the risk of SNP genotype disclosure to siblings and demonstrated inference techniques using HapMap data.
Potential Biases
The approach does not account for potential genotypic errors and assumes independence of loci.
Limitations
The study relies on population-based estimates for minor allele frequency from the HapMap population, which is small.
Participant Demographics
The study used data from the HapMap CEPH population, which includes 90 participants of northern and western European ancestry.
Statistical Information
P-Value
0.0001
Statistical Significance
p<0.05
Digital Object Identifier (DOI)
Want to read the original?
Access the complete publication on the publisher's website