anV: Mapping & Assembly

Mapping and Assembly

(Bacterial Genome Analysis Piplene)

After filtering the raw reads, you can choose either of the following methods depending on the availability of reference genome or intra-species variations.
Basically, if you have a reference genome and do not expect much variation from it, then the reads are mapped to the reference. Else, de novo assembly is preferred.

Mapping to a Reference

Index the reference sequence

$ mkdir mapping
            $ cd mapping
            $ conda deactivate
            $ conda activate mappers
            $ cp ../resources/NZ_CP053256.1_A45_Chr.fasta ./Ref_A45_chr.fasta
            $ bwa index -a is Ref_A45_chr.fasta

Align Reads separately

$ cp ../bb_out/*.fastq ./
            $ bwa aln -t 12 Ref_A45_chr.fasta a45_R1.fastq > a45_R1.sai
            $ bwa aln -t 12 Ref_A45_chr.fasta a45_R2.fastq > a45_R2.sai

Create SAM file and convert it to BAM

$  bwa sampe Ref_A45_chr.fasta a45_R1.sai a45_R2.sai a45_R1.fastq a45_R2.fastq > a45_aln.sam
            $ samtools view -S a45_aln.sam -b -o a45_aln.bam

Sort and index BAM file

$ samtools sort a45_aln.bam -o a45_sorted.bam
            $ samtools index a45_sorted.bam

Creating a consensus

$ samtools mpileup -uf Ref_A45_chr.fasta a45_sorted.bam | bcftools call -c | vcfutils.pl vcf2fq > a45_consensus.fq

Convert to fasta & change the identifier

$ conda deactivate
            $ conda activate emboss
            $ seqret -osformat fasta a45_consensus.fq -out2 a45_consensus.fa

Exporting unmapped reads (Optional)

$ conda deactivate
            $ conda activate bam2fastq
            $ bam2fastq --no-aligned --force --strict -o a45_unmapped#.fq a45_sorted.bam

Check the rRNA Genes (Optional)

$ conda deactivate
            $ conda activate barrnap
            $ barrnap -o cons_rrna.fa < ../mappers/a45_consensus.fa > cons_rrna.gff

de novo Assembly

SPAdes

$ mkdir spades
            $ spades.py --careful --pe1-1 bb_out/a45_R1.fastq --pe1-2 bb_out/a45_R2.fastq -o spades/ --cov-cutoff auto -t 12

Started at 11:37 pm. Ended at 11:43 pm.
With 4 threads: 11:49 pm to 11.59 pm

Results
Check for insert size in the log file & number of contigs/scaffolds in the fasta file.

$ grep -i 'insert' spades.log
            $ grep -i '>' scaffolds.fasta -c

Barrnap (Optional)

$ mkdir barrnap_out
            $ barrnap -o spades_rrna.fa < ../spades/scaffolds.fasta > spades_rrna.gff

Alternatives

IDBA
Before running IDBA, fastq files should be merged.

$ fq2fa --merge --filter ../bb_out/a45_R1.fastq ../bb_out/a45_R2.fastq a45_reads.fa
          $ idba_ud -r a45_reads.fa -o ./

Velvet
Usually, Velvet requires an input of k-mer length and performs assembly with that input.
VelvetOptimiser is a wrapper for Velvet, which accepts a range of k-mer and performs multiple Velvet runs over that k-mer range.

$ VelvetOptimiser.pl -s 79 -e 159 -f '-shortPaired -fastq -separate ../bb_out/a45_R1.fastq ../bb_out/a45_R2.fastq' -t 12 -d ./ -v

Unicycler
A hybrid assembly pipeline that uses SPAdes for assembling Illumina reads and miniasm+Racon for long reads. Also includes Pilon for polishing the draft genome.

$ unicycler -1 ../bb_out/a45_R1.fastq -2 ../bb_out/a45_R2.fastq -o ./ -t 12