Mapping and Assembly

(Bacterial Genome Analysis Piplene)

After filtering the raw reads, you can choose either of the following methods depending on the availability of reference genome or intra-species variations.
Basically, if you have a reference genome and do not expect much variation from it, then the reads are mapped to the reference. Else, de novo assembly is preferred.

Mapping to a Reference


  1. Index the reference sequence
    $ mkdir mapping
                $ cd mapping
                $ conda deactivate
                $ conda activate mappers
                $ cp ../resources/NZ_CP053256.1_A45_Chr.fasta ./Ref_A45_chr.fasta
                $ bwa index -a is Ref_A45_chr.fasta
  2. Align Reads separately
    $ cp ../bb_out/*.fastq ./
                $ bwa aln -t 12 Ref_A45_chr.fasta a45_R1.fastq > a45_R1.sai
                $ bwa aln -t 12 Ref_A45_chr.fasta a45_R2.fastq > a45_R2.sai
              
  3. Create SAM file and convert it to BAM
    $  bwa sampe Ref_A45_chr.fasta a45_R1.sai a45_R2.sai a45_R1.fastq a45_R2.fastq > a45_aln.sam
                $ samtools view -S a45_aln.sam -b -o a45_aln.bam
              
  4. Sort and index BAM file
    $ samtools sort a45_aln.bam -o a45_sorted.bam
                $ samtools index a45_sorted.bam
              
  5. Creating a consensus
    $ samtools mpileup -uf Ref_A45_chr.fasta a45_sorted.bam | bcftools call -c | vcfutils.pl vcf2fq > a45_consensus.fq
  6. Convert to fasta & change the identifier
    $ conda deactivate
                $ conda activate emboss
                $ seqret -osformat fasta a45_consensus.fq -out2 a45_consensus.fa
              
  7. Exporting unmapped reads (Optional)
    $ conda deactivate
                $ conda activate bam2fastq
                $ bam2fastq --no-aligned --force --strict -o a45_unmapped#.fq a45_sorted.bam
  8. Check the rRNA Genes (Optional)
    $ conda deactivate
                $ conda activate barrnap
                $ barrnap -o cons_rrna.fa < ../mappers/a45_consensus.fa > cons_rrna.gff
              

de novo Assembly


  1. SPAdes
    $ mkdir spades
                $ spades.py --careful --pe1-1 bb_out/a45_R1.fastq --pe1-2 bb_out/a45_R2.fastq -o spades/ --cov-cutoff auto -t 12
    Started at 11:37 pm. Ended at 11:43 pm.
    With 4 threads: 11:49 pm to 11.59 pm

  2. Results
    Check for insert size in the log file & number of contigs/scaffolds in the fasta file.
    $ grep -i 'insert' spades.log
                $ grep -i '>' scaffolds.fasta -c
              
  3. Barrnap (Optional)
    $ mkdir barrnap_out
                $ barrnap -o spades_rrna.fa < ../spades/scaffolds.fasta > spades_rrna.gff

Alternatives

IDBA
Before running IDBA, fastq files should be merged.
$ fq2fa --merge --filter ../bb_out/a45_R1.fastq ../bb_out/a45_R2.fastq a45_reads.fa
          $ idba_ud -r a45_reads.fa -o ./
        
Velvet
Usually, Velvet requires an input of k-mer length and performs assembly with that input.
VelvetOptimiser is a wrapper for Velvet, which accepts a range of k-mer and performs multiple Velvet runs over that k-mer range.
$ VelvetOptimiser.pl -s 79 -e 159 -f '-shortPaired -fastq -separate ../bb_out/a45_R1.fastq ../bb_out/a45_R2.fastq' -t 12 -d ./ -v
Unicycler
A hybrid assembly pipeline that uses SPAdes for assembling Illumina reads and miniasm+Racon for long reads. Also includes Pilon for polishing the draft genome.
$ unicycler -1 ../bb_out/a45_R1.fastq -2 ../bb_out/a45_R2.fastq -o ./ -t 12