Creating a select set – VDJbase and OGRDB

A select set is a set of reads taken from a single repertoire that directly support a specific inference.

In outline, the process to create the set is as follows:

For a paired-end-read dataset, merge all paired-end reads
Align reads to the inferred allele reference
Filter the output to an identity of at least 96%
Filter reads to match the novel allele’s SNPs
Create a select set matching the filtered reads (for a paired-end-read dataset, this should consist of the original unpaired reads)

A script and associated docker image have been created to perform these steps. You can use the script directly, derive from it, or follow your own procedure if you prefer.