K-mers & De Bruijn Graphs
k-mers are substrings of length k. Example
Why k-mers are required
De Bruijn graph a directed graph representing overlaps between sequences of symbols
Example
A Quality Control Tool. Works on FASTQ, SAM and BAM files
qc
$ cd Desktop/bgap
$ conda deactivate
$ conda activate qc
$ fastqc
$ mkdir bb_out
$ cd bb_out
$ bbduk.sh in1=../reads/a45_R1.fastq in2=../reads/a45_R2.fastq out1=a45_R1.fastq out2=a45_R2.fastq ref=adapters.fa k=23 mink=7 ktrim=r hdist=1 qtrim=r trimq=20 minlen=100 tpe tbo
Input: 1741880 reads 436749684 bases.
QTrimmed: 1522186 reads (87.39%) 103642774 bases (23.73%)
KTrimmed: 376743 reads (21.63%) 13100754 bases (3.00%)
Trimmed by overlap: 8692 reads (0.50%) 88862 bases (0.02%)
Total Removed: 170588 reads (9.79%) 116832390 bases (26.75%)
Result: 1571292 reads (90.21%) 319917294 bases (73.25%)
$ fastqc
$ mkdir trim_out
$ cd trim_out
$ trimmomatic PE -phred33 ../reads/a45_R1.fastq ../reads/a45_R2.fastq a45_R1_paired.fq.gz a45_R1_unpaired.fq.gz a45_R2_paired.fq.gz a45_R2_unpaired.fq.gz ILLUMINACLIP:../adapters.fa:2:30:10 SLIDINGWINDOW:4:20 MINLEN:100
$ fastqc