Read mapping¶

In this step, we will align our reads to the E. coli reference genome. Any standard short-read alignment program such as BWA, Bowtie can be used for this step. We will use Bowtie 1 for aligning our reads. For short reads (< 50bp), Bowtie 1 is faster and sensitive than Bowtie2.

Input Data:

Input	Description	Location
Sequence reads	Raw or quality filtered reads	iplantcollaborative > example_data > chipseq_webinar -> fastqfiles

Run Bowtie1 in the CyVerse Discovery Environment

Click on “Apps” tab in the Discovery Environment and search for “bowtie”.
Click on the app icon. Bowtie build and map app builds a bowtie index and then maps reads.

Change the name of the analysis and output folder as needed or leave for defaults.
Under bowtie build Input section provide Reference genome file in Fasta format. Browse through the datastore and provide GCF_000005845.2_ASM584v2_genomic.fna Ecoli reference genome. This file is provided with the sample dataset- iplantcollaborative > example_data > chipseq_webinar -> ecoli_refgenome.
Under bowtie build Output section provide name of the basename for index ‘ecoli’. Under bowtie map- reference inputs section provide the same name for the index base name ‘ecoli’.
Under bowtie map read Input section provide the ChIP sequence reads file SRR576933_IP.fastq. Input data location- iplantcollaborative > example_data > chipseq_webinar -> fastqfiles.
Provide an output file name and click on the Analyses to check the status of your job. When the analysis completes, click on the right three dots menu and click on ‘Go to output folder’ to access you output files. Repeat the same steps for control dataset SRR576933_control.fastq

Sequencing depth

Effective analysis of ChIP-seq data requires sufficient coverage by sequence reads (sequencing depth). The required depth depends mainly on the size of the genome and the number and size of the binding sites of the protein. ENCODE’s guidelines is to obtain minimum 10 million uniquely mapping reads per replicate experiment for mammalian genomes (Landt et al, 2009).

Note

If atleast 10 million uniquely mapping reads are required for human genome. How many minimum reads are required for E. coli dataset to have sufficent coverage for further analysis?

Output/Results

Output	Description	Example
Alignment files	Alignment files in SAM format	bowtieout_control.sam

Description of output and results

Bowtie build and map app by default provides output files in SAM format which stands for Sequence Alignment/Map format. For more details on SAM format Read here

Fix or improve this documentation

Search for an answer: CyVerse Learning Center
Ask us for help: click on the lower right-hand side of the page
Report an issue or submit a change: Github Repo Link
Send feedback: Tutorials@CyVerse.org

Learning Center Home