Genomic Research and Sibling Identifiability

Sample size: 452684 publication 15 minutes Evidence: high

Author Information

Author(s): Cassa Christopher A, Schmidt Brian, Kohane Isaac S, Mandl Kenneth D

Primary Institution: Children's Hospital Informatics Program at the Harvard-MIT Division of Health Sciences and Technology

How much familial information can be inferred from genomic data, particularly regarding siblings?

Substantial discrimination and privacy risks arise from the use of inferred familial genomic data.

Sibling SNP genotypes can be inferred with substantial accuracy.
A very low number of matches at commonly varying SNPs is sufficient to confirm sib-ship.
Using HapMap trio data, we achieved 91.9% inference accuracy for sibling genotypes.

This study shows that we can guess a sibling's genetic information from another sibling's DNA, which can lead to privacy issues.

The study used a framework to measure the risk of SNP genotype disclosure to siblings and demonstrated inference techniques using HapMap data.

The approach does not account for potential genotypic errors and assumes independence of loci.

The study relies on population-based estimates for minor allele frequency from the HapMap population, which is small.

The study used data from the HapMap CEPH population, which includes 90 participants of northern and western European ancestry.

0.0001

p<0.05

Access the complete publication on the publisher's website