Getting started

This page provides a guide on how to install SatTCR, a description of the config file and a description of the modules.

Downloading SatTCR

SatTCR is publicly available on GitHub. SatTCT can be downloaded from the command line using the command:

git clone git@github.com:Ong-Research/SatTCR.git {repo_name}

where ‘{repo_name}’ is the desired name for the working directory.

Installing prerequisite software

The SatTCR pipeline requires:

Snakemake: https://snakemake.readthedocs.io/en/stable/
Docker: https://www.docker.com/

It uses Snakemake to schedule the jobs to run the pipeline, and every job is run in a different container.

Snakemake can be installed in different ways:

Using pip:
```
pip install snakemake
```
Using conda or mamba:
```
conda activate {env_name}
conda install -c bioconda snakemake
```
where {env_name} is the name a conda environment that was created previously.

Getting Docker images

We curated a list of Docker containers that are utilized by SatTCR. Their images can be obtained by using:

cd {repo_name} 

docker pull staphb/fastqc # FastQC image
docker pull staphb/multiqc # MultiQC image
docker pull staphb/trimmomatic # Trimmomatic image
docker build -t tcr/sattcr - < Dockerfile # R and Quarto image
docker pull ghcr.io/milaboratory/mixcr/mixcr:latest # MIXCR image

For MIXCR to work, it is necessary to get a license from https://mixcr.com/mixcr/getting-started/milm/ and save it into a file. The name of this file is required to be specified in the config/config.yaml file in the mixcr key under license_file.

Configuring the SatTCR pipeline

Detailed examples on how to run SatTCR are available with the dataset:

Zuleger et al 2024

1. Create a comma-separated value (csv) with 2 columns:

sample_name : The name of the sample
sample_file: The prefix of the files until before the _R1 and _R2 parts, e.g. if the pair of RNA-seq files are data/sample1_R1_L001.fastq.gz and data/sample1_R2_L001.fastq.gz, then this column is data/sample1.

2. Edit the `config/config.yaml` file. This file is divided by pieces in order to easily configure running the pipeline:

The configuation file is separated into the following sections:

General configuration parameters:

threads: Max. # of parallel threads used per process.
samplefile: Location of the file with the samples.
seed: Seed number for random number generation and sequence sampling during saturation analysis.
run_*: Logical indicators to determine if running a stage of the pipeline
suffix: This is regarding to the samplefile. If the pair of RNA-seq files are data/sample1_R1_L001.fastq.gz and data/sample1_R2_L001.fastq.gz. The suffix would be the remaining part after the R1/R2 parts, i.e. _L001.fastq.gz.

Docker configuration parameters: - run_line: This is the docker command used to run every rule. - fastqc, multiqc, trimmomatic, rquarto and mixcr are the names of the images that were pulled before.

In general, it is not necessary to modify these parameters unless a different image name is used or a specific need to configure how docker runs in the user’s system.

Trimmomatic configuration parameters:

trimmer: A vector with the trimmomatic configuration to use. More information is available in http://www.usadellab.org/cms/?page=trimmomatic. But the general idea is to remove the low-quality nucleotides at the end of the sequences, or very short sequences.

MIXCR configuration parameters:

params: The configuration line used to control MIXCR behavior. We used the line below to assemble the clonotypes analyzed used in this manuscript rna-seq –species dog -b imgt.202214-2 –rna. MIXCR provides a comprehensive list of preset configuration in https://mixcr.com/mixcr/reference/overview-built-in-presets/. The imgt.202214-2 library file was downloaded from the repseqio repository release page.; this file contains sequences for many species and details are available in the IMGT website, for example the available species for TCR \(\beta\) chains are:

Available species for TCR \(\beta\) chain

Human, Mouse, Ma’s night monkey, Rhesus Monkey , Rainbow trout, Dog, Ferret, Rabbit, Pig, Cat, Sheep, Camel, Crab-eating macaque, Naked mole-rat, Bovine, Mouse C57BL/6J, Gorilla

license_file: Location of the file with the license. The pipeline uses this file to run MIXCR in a docker container.

Saturation configuration parameters:

samples: A vector with the sample keys for which the saturation analysis is going to be processed
block_size or nblocks: Either the # of sequences that are going to be sampled by block or the # of blocks of sequences used to split the original sequence files.
bootstrap_replicates: The number of times that the block bootstrap sampling procedure is going to be repeated. This rule is computationally intensive, because in total there are going to be sampled n_blocks-1 x n_boot_reps pairs of sequence files and then MIXCR is used for each pair of files.

Running SatTCR modules

In the instructions below, the flag -c{k} stands for running the rule with {k} parallel threads.

Quality control

snakemake -c{k} qc

The output of this rule are an html report generated with MultiQC and quality profiles generated with the R package dada2 (Callahan et al. 2016). Either one of these analyses will depict quality score summaries at each position of the sequence files.

Trim sequences

snakemake -c{k} trim

The output of this rule are the trimmed versions for every raw sequence file.

Clonotype assembly with MIXCR

snakemake -c{k} mixcr

The output of this rule is a tsv file according to the AIRR format (https://docs.airr-community.org/en/stable/datarep/overview.html) for every set of RNA-seq paired files.

Block bootstrap sampling and saturation analysis

snakemake -c{k} sampling
snakemake -c{k} saturation

The first rule generates n_blocks-1 x n_boot_reps pairs of compressed fastq files, and the second rule uses those sequence files to assemble the clonotypes with each pair of generated sequence files.

Generate the report

snakemake -c{k} report

This rule produces an html report compiled by quarto summarizing the results of the analysis.

Generate partial reports

To generate a report only with the quality control part:

snakemake -c{k} qc_report

To generate a report with the quality control and the repertoire parts, i.e. everything except the saturation analysis:

snakemake -c{k} repertoire_report

These rules generate a partial html report compiled by quarto.

Downloading SatTCR

Installing prerequisite software

Getting Docker images

Configuring the SatTCR pipeline

1. Create a comma-separated value (csv) with 2 columns:

2. Edit the config/config.yaml file. This file is divided by pieces in order to easily configure running the pipeline:

Running SatTCR modules

2. Edit the `config/config.yaml` file. This file is divided by pieces in order to easily configure running the pipeline: