nf-core/bactmap — Gitpedia

Introduction

nf-core/bactmap is a bioinformatics best-practice analysis pipeline for mapping short (Illumina) and long reads (Oxford Nanopore) from bacterial WGS to a reference sequence, creating filtered VCF files and making pseudogenomes based on high quality positions in the VCF files.

Pipeline summary

Index reference fasta file (short-read: BWA index or Bowtie2 build; long-read: minimap2 index)
Read QC (FastQC or falco as an alternative option)
Calculate fastq summary statistics (fastq-scan)
Perform read pre-processing (optional)
- Adapter clipping and merging (short-read: fastp or AdapterRemoval2; long-read: porechop or Porechop_ABI)
- Quality filtering (long-read: Filtlong), Nanoq
- Run merging (cat)
Downsample fastq files (optional) (Rasusa)
Summarise read statistics pre- and post-processing and subsampling (read_stats)
Variant calling

Map reads to reference (short-read: BWA-MEM2 or Bowtie2; long-read: minimap2)
Sort and index alignments (SAMtools view/sort)
Summarise alignment statistics (SAMtools stats)
Call variants (short-read: FreeBayes; long-read: Clair3)
Filter variants (BCFtools filter)
Summarise variant statistics (BCFtools stats)
Convert filtered bcf to pseudogenome fasta (BCFtools consensus and BEDtools)
Summarise mapping statistics (seqtk)

Create alignment from pseudogenomes by concatenating fasta files having first checked that the sample sequences are high quality (alignpseudogenomes)
Extract variant sites from alignment (SNP-sites)
Present QC for raw and processed reads, alignment statistics and variant statistics (MultiQC)

Usage

[!NOTE]
If you are new to Nextflow and nf-core, please refer to this page on how to set-up Nextflow. Make sure to test your setup with -profile test before running the workflow on actual data.

First, prepare a samplesheet with your input data that looks as follows:

csv
sample,run_accession,instrument_platform,fastq_1,fastq_2
2612,run1,ILLUMINA,2612_run1_R1.fq.gz,
2613,run1,ILLUMINA,2612_run3_R1.fq.gz,2612_run3_R2.fq.gz
2614,run3,OXFORD_NANOPORE,2614_file1.fastq.gz,
2614,run3,OXFORD_NANOPORE,2614_file2.fastq.gz,

Each row represents a fastq file (single-end) or a pair of fastq files (paired end), either Illumina (short reads) or Oxford Nanopore (long reads).

Additionally, if you are analysing Oxford Nanopore data, you will need to provide the path to a model to use with Clair3 (specified with --clair3_model). Models for older chemistries and basecallers (e.g. r9.4.1) can be downloaded from here. For newer chemistries and basecallers, ONT provides models through Rerio. To download the models for Clair3 from the ONT github, you can use the following commands (each model will be downloaded to the folder clair3_models/<clair3_model_name>):

bash
# Clone the rerio repository
git clone https://github.com/nanoporetech/rerio

# Download all models
python3 download_model.py --clair3

Now, you can run the pipeline using:

bash
nextflow run nf-core/bactmap \
   -profile <docker/singularity/.../institute> \
   --input samplesheet.csv \
   --fasta <REFERENCE_FASTA> \
   --clair3_model <PATH_TO_CLAIR3_MODEL> \
   --outdir <OUTDIR>

[!WARNING]
Please provide pipeline parameters via the CLI or Nextflow -params-file option. Custom config files including those provided by the -c Nextflow option can be used to provide any configuration except for parameters; see docs.

For more details and further functionality, please refer to the usage documentation and the parameter documentation.

Pipeline output

To see the results of an example test run with a full size dataset refer to the results tab on the nf-core website pipeline page.
For more details about the output files and reports, please refer to the
output documentation.

Credits

nf-core/bactmap was originally written by Anthony Underwood, Andries van Tonder and Thanh Le Viet.

We thank the following people for their extensive assistance in the development
of this pipeline:

Anthony Underwood's time working on the project was funded by the National Institute for Health Research(NIHR) Global Health Research Unit for the Surveillance of Antimicrobial Resistance (Grant Reference Number 16/136/111)
NIHR funded

Contributions and Support

If you would like to contribute to this pipeline, please see the contributing guidelines.

For further information or help, don't hesitate to get in touch on the Slack #bactmap channel (you can join with this invite).

Citations

An extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md file.

You can cite the nf-core publication as follows:

The nf-core framework for community-curated bioinformatics pipelines.

Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.

Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.