Read mapping¶
In this step, we will align our reads to the E. coli reference genome. Any standard short-read alignment program such as BWA, Bowtie can be used for this step. We will use Bowtie 1 for aligning our reads. For short reads (< 50bp), Bowtie 1 is faster and sensitive than Bowtie2.
Input Data:
Input | Description | Location |
---|---|---|
Sequence reads | Raw or quality filtered reads | iplantcollaborative > example_data > chipseq_webinar -> fastqfiles |
Run Bowtie1 in the CyVerse Discovery Environment
- Click on “Apps” tab in the Discovery Environment and search for “bowtie”.
- Click on the app icon. Bowtie build and map app builds a bowtie index and then maps reads.
- Change the name of the analysis and output folder as needed or leave for defaults.
- Under bowtie build Input section provide Reference genome file in Fasta format. Browse through the datastore and provide GCF_000005845.2_ASM584v2_genomic.fna Ecoli reference genome. This file is provided with the sample dataset- iplantcollaborative > example_data > chipseq_webinar -> ecoli_refgenome.
- Under bowtie build Output section provide name of the basename for index ‘ecoli’. Under bowtie map- reference inputs section provide the same name for the index base name ‘ecoli’.
- Under bowtie map read Input section provide the ChIP sequence reads file SRR576933_IP.fastq. Input data location- iplantcollaborative > example_data > chipseq_webinar -> fastqfiles.
- Provide an output file name and click on the Analyses to check the status of your job. When the analysis completes, click on the right three dots menu and click on ‘Go to output folder’ to access you output files. Repeat the same steps for control dataset SRR576933_control.fastq
Sequencing depth
Effective analysis of ChIP-seq data requires sufficient coverage by sequence reads (sequencing depth). The required depth depends mainly on the size of the genome and the number and size of the binding sites of the protein. ENCODE’s guidelines is to obtain minimum 10 million uniquely mapping reads per replicate experiment for mammalian genomes (Landt et al, 2009).
Note
If atleast 10 million uniquely mapping reads are required for human genome. How many minimum reads are required for E. coli dataset to have sufficent coverage for further analysis?
Output/Results
Output | Description | Example |
---|---|---|
Alignment files | Alignment files in SAM format | bowtieout_control.sam |
Description of output and results
Bowtie build and map app by default provides output files in SAM format which stands for Sequence Alignment/Map format. For more details on SAM format Read here
Fix or improve this documentation
- Search for an answer: CyVerse Learning Center
- Ask us for help: click on the lower right-hand side of the page
- Report an issue or submit a change: Github Repo Link
- Send feedback: Tutorials@CyVerse.org