Course - detail

LGN5835 - Applied Bioinformatics


Credit hours

In-class work
per week
Practice
per week
Credits
Duration
Total
2
2
8
15 weeks
120 hours

Instructor
Gabriel Rodrigues Alves Margarido

Objective
This course will offer students a theoretical and practical view of technologies available for high throughput genome sequencing, with emphasis on tools available for data analysis. It will enable
students to apply genotyping and transcriptomic strategies to breeding.

Content
Genomics and Bioinformatics. Modern technologies for DNA sequencing. Pre-processing of nucleotide
sequencing data. Biological sequence alignment. De novo genome sequencing and assembly.
Polymorphism prospection and genotyping: genomic resequencing; sequencing of reduced
representation genomic libraries (GBS and RAD). Functional genomics and transcriptomics. De novo
transcriptome assembly. Differential gene expression. Functional enrichment analysis. Software and platforms to be used include: Bowtie, HISAT, BWA-MEM, IGV, TASSEL-GBS, GATK, FreeBayes, Trinity, R,
R/Bioconductor, edgeR, goseq.

Bibliography
Altschul, S.F.; Gish, W.; Miller, W.; Myers, E.W.; Lipman, D.J. Basic Local Alignment Search Tool.
Journal of Molecular Biology, v. 215, p. 403-410, 1990.
Altschul, S.F.; Madden, T.L.; Schäffer, A.A.; Zhang, J.; Zhang, Z.; Miller, W.; Lipman, D.J. Gapped
BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Research,
v. 25, p. 3389-3402, 1997.
Anders, S.; Huber, W. Differential expression analysis for sequence count data. Genome Biology, v. 11,
R106, 2010.
Anders, S.; Pul, P.T.; Huber, W. HTSeq—a Python framework to work with high-throughput sequencing
data. Bioinformatics, v. 31, p. 166-169, 2015.
Baker, M. De novo genome assembly: what every biologist should know. Nature Methods, v. 9, p. 333-
337, 2012.
Catchen, J.M.; Amores, A.; Hohenlohe, P.; Cresko, W.; Postlethwait, J.H. Stacks: Building and
Genotyping Loci De Novo From Short-Read Sequences. G3, v. 1, p. 171-182, 2011.
Davey, J.W.; Blaxter, M.L. RADSeq: next-generation population genetics. Briefings in Functional
Genomics, v. 9, p. 416-423, 2011.
Eaton, D.A.R. PyRAD: assembly of de novo RADseq loci for phylogenetic analyses. Bioinformatics, v. 30,
p. 1844-1849, 2014.
Elshire, R.J.; Glaubitz, J.C.; Sun, Q.; Poland, J.A.; Kawamoto, K.; Buckler, E.S.; Mitchell, S.E. A Robust,
Simple Genotyping-by-Sequencing (GBS) Approach for High Diversity Species. PLoS ONE, v. 6, e19379,
2011.
Flicek, P.; Birney, E. Sense from sequence reads: methods for alignment and assembly. Nature Methods,
v. 6, S6-S12, 2009.
Garber, M. et al. Computational methods for transcriptome annotation and quantification using RNA-seq.
Nature Methods, v. 8, p. 469-477, 2011.
Glaubitz, J.C.; Casstevens, T.M.; Lu, F.; Harriman, J.; Elshire, R.J.; Sun, Q.; Buckler, E.S. TASSEL-GBS:
A High Capacity Genotyping by Sequencing Analysis Pipeline. PLoS ONE, v. 9, e90346, 2014.
Gotoh, O. An Improved Algorithm for Matching Biological Sequences. Journal of Molecular Biology, v.
162, p. 705-708, 1982.
Grabherr, M.G. et al. Full-length transcriptome assembly from RNA-seq data without a reference
genome. Nature Biotechnology, v. 29, p. 644-652, 2011.
Green, E. Strategies for the systematic sequencing of complex genomes. Nature Reviews Genetics, v. 2,
p. 573-583, 2001.
Haas, B.J.; Zody, M. Advancing RNA-Seq analysis. Nature Biotechnology, v. 28, p. 421-423, 2010.
Haas, B.J. et al. De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for
reference generation and analysis. Nature Protocols, v. 8, p. 1494-1512, 2013.
Langmead, B.; Trapnell, C.; Pop, M.; Salzberg, S.L. Ultrafast and memory-efficient alignment of short
DNA sequences to the human genome. Genome Biology, v. 10, R25, 2009.
Li, H.; Homer, N. A survey of sequence alignment algorithms for next-generation sequencing. Briefings
in Bioinformatics, v. 11, p. 473-438, 2010.Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv,
1303.3997, 2013.
Love, M.I.; Huber, W.; Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data
with DESeq2. Genome Biology, v. 15, 550, 2014.
Needleman, S.B.; Wunsch, C.D. A General Method Applicable to the Search for Similarities in the Amino
Acid Sequence of Two Proteins. Journal of Molecular Biology, v. 48, p. 443-453, 1970.
Oshlack, A.; Robinson, M.; Young, M. From RNA-seq reads to differential expression results. Genome
Biology, v. 11, p. 220, 2010.
Peterson, B.K.; Weber, J.N.; Kay, E.H.; Fisher, H.S.; Hoekstra, H.E. Double Digest RADseq: An
Inexpensive Method for De Novo SNP Discovery and Genotyping in Model and Non-Model Species. PLoS
ONE, v. 7, e37135, 2012.
Pevsner, J. Bioinformatics and Functional Genomics. 2 ed. Wiley-Blackwell, 992p. 2009.
Poland, J.A.; Rife, T.W. Genotyping-by-Sequencing for Plant Breeding and Genetics. The Plant Genome,
v. 5, p. 92-102, 2012.
Robinson, M.D.; McCarthy, D.J.; Smyth, G.K. edgeR: a Bioconductor package for differential expression
analysis of digital gene expression data. Bioinformatics, v. 26, p. 139-140, 2009.
Smith, T.F.; Waterman, M.S. Identification of Common Molecular Subsequences. Journal of Molecular
Biology, v. 147, p. 195-197, 1981.
Trapnell, C.; Pachter, L.; Salzberg, S.L. TopHat: discovering splice junctions with RNA-Seq.
Bioinformatics, v. 25, p. 1105-1111, 2009.
Trapnell, C.; Salzberg, S.L. How to map billions of short reads onto genomes. Nature Biotechnology, v.
27, p. 455-457, 2009.
Wang, Z.; Gerstein, M.; Snyder, M. RNA-Seq: a revolutionary tool for transcriptomics. Nature Reviews
Genetics, v. 10, p. 57-63, 2009.
Wang, L.; Feng, Z.; Wang, X.; Wang, X.; Zhang, X. DEGseq: an R package for identifying differentially
expressed genes from RNA-seq data. Bioinformatics, v. 26, p. 136-138, 2010.