Annotating genomes with massive-scale RNA sequencing
2008

Building Gene Models from RNA Sequencing Data

Sample size: 175000000 publication Evidence: high

Author Information

Author(s): Denoeud France, Aury Jean-Marc, Da Silva Corinne, Noel Benjamin, Rogier Odile, Delledonne Massimo, Morgante Michele, Valle Giorgio, Wincker Patrick, Scarpelli Claude, Jaillon Olivier, Artiguenave François

Primary Institution: CEA, DSV, Institut de Génomique, Genoscope

Hypothesis

Can RNA-Seq data be used to build gene models de novo without prior knowledge of known genes?

Conclusion

The G-Mo.R-Se method effectively builds gene models from RNA-Seq data, identifying more loci than traditional cDNA sequencing at a lower cost.

Supporting Evidence

  • The G-Mo.R-Se method produced 46,062 transcript models clustered in 19,486 loci.
  • G-Mo.R-Se detected more loci than traditional cDNA sequencing, identifying 70% of cDNA loci.
  • The method showed a cost reduction of about 20 times compared to cDNA sequencing.

Takeaway

Scientists created a new way to find genes using RNA data, which helps discover more genes than older methods.

Methodology

The study used RNA-Seq data to build gene models by mapping reads to the grapevine genome and validating junctions between candidate exons.

Potential Biases

Potential biases in read coverage and mapping could affect the accuracy of the gene models.

Limitations

The method does not produce mono-exonic models and may miss some alternative splicing events.

Digital Object Identifier (DOI)

10.1186/gb-2008-9-12-r175

Want to read the original?

Access the complete publication on the publisher's website

View Original Publication