Lecture 02B
DNA sequencing
Methodology
Date: Jan 16, 2024
This session focuses on the practical steps and computational tools required for analyzing sequencing data. Students will engage in hands-on activities using Python and Galaxy to process sequencing datasets, assess data quality, and clean reads for downstream analyses.
Learning objectives¶
After today, you should have a better understanding of:
- Sequencing data formats such as FASTA and FASTQ.
- Assessing sequencing data quality.
- Cleaning and preprocessing sequencing data.
Outline¶
Sequencing data formats¶
Familiarity with FASTA and FASTQ formats is essential for analyzing sequencing data effectively.
- Explore the structure and key components of FASTA and FASTQ files.
- Load sample FASTQ files and inspect their content using Python.
- Demonstrate how FastQC can analyze the quality of FASTQ files.
Assessing sequencing data quality¶
Quality control identifies potential issues in sequencing data that could affect downstream analysis.
- Learn the role of FastQC in assessing sequence quality.
- Use Python to calculate basic quality metrics from FASTQ files, such as average quality scores.
- Compare FastQC results with Python-generated metrics to reinforce the importance of automated tools.
Cleaning and preprocessing sequencing data¶
Data cleaning removes contaminants and improves the reliability of sequencing data.
- Introduce the need for adapter trimming and quality filtering with Fastp.
- Demonstrate Fastp’s preprocessing capabilities and interpret its output.
- Write Python scripts to simulate basic preprocessing, such as filtering low-quality reads or trimming sequences.
Supplementary material¶
Relevant content for today's lecture.
- FASTA files
- FASTQ files
- FastQC and nested content
Presentation¶
- View: slides.com/aalexmmaldonado/biosc1540-l02b
- Live link: slides.com/d/xoclcZw/live
- Download: biosc1540-l02b.pdf
→