Getting started
This page provides a guide on how to install SatTCR, a description of the config file and a description of the modules.
Downloading SatTCR
SatTCR is publicly available on GitHub. SatTCT can be downloaded from the command line using the command:
git clone git@github.com:Ong-Research/SatTCR.git {repo_name}where ‘{repo_name}’ is the desired name for the working directory.
Installing prerequisite software
The SatTCR pipeline requires:
- Snakemake: https://snakemake.readthedocs.io/en/stable/
- Docker: https://www.docker.com/
It uses Snakemake to schedule the jobs to run the pipeline, and every job is run in a different container.
Snakemake can be installed in different ways:
Using
pip:pip install snakemakeUsing
condaormamba:conda activate {env_name} conda install -c bioconda snakemakewhere
{env_name}is the name a conda environment that was created previously.
Getting Docker images
We curated a list of Docker containers that are utilized by SatTCR. Their images can be obtained by using:
cd {repo_name}
docker pull staphb/fastqc # FastQC image
docker pull staphb/multiqc # MultiQC image
docker pull staphb/trimmomatic # Trimmomatic image
docker build -t tcr/sattcr - < Dockerfile # R and Quarto image
docker pull ghcr.io/milaboratory/mixcr/mixcr:latest # MIXCR imageFor MIXCR to work, it is necessary to get a license from https://mixcr.com/mixcr/getting-started/milm/ and save it into a file. The name of this file is required to be specified in the config/config.yaml file in the mixcr key under license_file.
Configuring the SatTCR pipeline
Detailed examples on how to run SatTCR are available with the dataset:
1. Create a comma-separated value (csv) with 2 columns:
sample_name: The name of the samplesample_file: The prefix of the files until before the_R1and_R2parts, e.g. if the pair of RNA-seq files are data/sample1_R1_L001.fastq.gz and data/sample1_R2_L001.fastq.gz, then this column isdata/sample1.
2. Edit the config/config.yaml file. This file is divided by pieces in order to easily configure running the pipeline:
The configuation file is separated into the following sections:
General configuration parameters:
threads: Max. # of parallel threads used per process.samplefile: Location of the file with the samples.seed: Seed number for random number generation and sequence sampling during saturation analysis.run_*: Logical indicators to determine if running a stage of the pipelinesuffix: This is regarding to thesamplefile. If the pair of RNA-seq files aredata/sample1_R1_L001.fastq.gzanddata/sample1_R2_L001.fastq.gz. The suffix would be the remaining part after the R1/R2 parts, i.e._L001.fastq.gz.
Docker configuration parameters: - run_line: This is the docker command used to run every rule. - fastqc, multiqc, trimmomatic, rquarto and mixcr are the names of the images that were pulled before.
In general, it is not necessary to modify these parameters unless a different image name is used or a specific need to configure how docker runs in the user’s system.
Trimmomatic configuration parameters:
trimmer: A vector with thetrimmomaticconfiguration to use. More information is available in http://www.usadellab.org/cms/?page=trimmomatic. But the general idea is to remove the low-quality nucleotides at the end of the sequences, or very short sequences.
MIXCR configuration parameters:
params: The configuration line used to control MIXCR behavior. We used the line below to assemble the clonotypes analyzed used in this manuscriptrna-seq –species dog -b imgt.202214-2 –rna. MIXCR provides a comprehensive list of preset configuration in https://mixcr.com/mixcr/reference/overview-built-in-presets/. Theimgt.202214-2library file was downloaded from the repseqio repository release page.; this file contains sequences for many species and details are available in the IMGT website, for example the available species for TCR \(\beta\) chains are:
Human, Mouse, Ma’s night monkey, Rhesus Monkey , Rainbow trout, Dog, Ferret, Rabbit, Pig, Cat, Sheep, Camel, Crab-eating macaque, Naked mole-rat, Bovine, Mouse C57BL/6J, Gorilla
license_file: Location of the file with the license. The pipeline uses this file to run MIXCR in a docker container.
Saturation configuration parameters:
samples: A vector with the sample keys for which the saturation analysis is going to be processedblock_sizeornblocks: Either the # of sequences that are going to be sampled by block or the # of blocks of sequences used to split the original sequence files.bootstrap_replicates: The number of times that the block bootstrap sampling procedure is going to be repeated. This rule is computationally intensive, because in total there are going to be sampledn_blocks-1 x n_boot_repspairs of sequence files and then MIXCR is used for each pair of files.
Running SatTCR modules
In the instructions below, the flag -c{k} stands for running the rule with {k} parallel threads.
Quality control
snakemake -c{k} qcThe output of this rule are an html report generated with MultiQC and quality profiles generated with the R package dada2 (Callahan et al. 2016). Either one of these analyses will depict quality score summaries at each position of the sequence files.
Trim sequences
snakemake -c{k} trimThe output of this rule are the trimmed versions for every raw sequence file.
Clonotype assembly with MIXCR
snakemake -c{k} mixcrThe output of this rule is a tsv file according to the AIRR format (https://docs.airr-community.org/en/stable/datarep/overview.html) for every set of RNA-seq paired files.
Block bootstrap sampling and saturation analysis
snakemake -c{k} sampling
snakemake -c{k} saturationThe first rule generates n_blocks-1 x n_boot_reps pairs of compressed fastq files, and the second rule uses those sequence files to assemble the clonotypes with each pair of generated sequence files.
Generate the report
snakemake -c{k} reportThis rule produces an html report compiled by quarto summarizing the results of the analysis.
Generate partial reports
- To generate a report only with the quality control part:
snakemake -c{k} qc_report- To generate a report with the quality control and the repertoire parts, i.e. everything except the saturation analysis:
snakemake -c{k} repertoire_reportThese rules generate a partial html report compiled by quarto.