Explanation of germline set formats

This article gives a brief explanation of the different germline set formats available from OGRDB.

Recommended for AIRR-seq Analysis

The sets available from the Download pages are tuned for AIRR-seq analysis. In particular, duplicate sequences are removed and sequences inferred from transcripts are extended to their likely full length, as inferred from other alleles of the same gene, even if this full-length sequence is not fully supported by the transcripts. These points are discussed in more detail here.

‘Source’ sets, which do contain duplicates and do not have such extensions, are also available. To find them, click on the name of a set on the Download page. You will be taken to the full information on the set. At the top of the page you will see download buttons for both the AIRR-seq Recommended and the Source sets.

JSON format

The JSON format provides full information on each sequence in the set, including alternate names, sequences of regulatory regions where available, supporting evidence, and delineation of gene features. The format follows the MiAIRR germline schema. The receptor-utils package contains a tool, download_germline_set, which can be used from the command line to download sets in various flexible formats, or to create these different formats from an already-downloaded JSON file. It can also create all necessary files to use the germline sets with IgBlast. The receptor-utils documentation has articles on the use of OGRDB germline sets with IgBlast and Mixcr.

FASTA formats

These are simple lists of the reference sequences. The sequence header contains just the allele name. In the ‘gapped’ set, V- sequences are aligned and gapped, following IMGT conventions, whereas in the ‘ungapped’ set they are not aligned. Other sequences are identical in the two sets.

If you would like to include more information in the FASTA header, the receptor-germline-tools package will create FASTA files with richer information. It will also add rich information from a JSON file to an existing AIRR rearrangements file.