A select set is a set of reads taken from a single repertoire that directly support a specific inference.
In outline, the process to create the set is as follows:
- For a paired-end-read dataset, merge all paired-end reads
- Align reads to the inferred allele reference
- Filter the output to an identity of at least 96%
- Filter reads to match the novel allele’s SNPs
- Create a select set matching the filtered reads (for a paired-end-read dataset, this should consist of the original unpaired reads)
A script and associated docker image have been created to perform these steps. You can use the script directly, derive from it, or follow your own procedure if you prefer.