Read support for genomic alleles

The annotation file for a genomic sample can be found by clicking on the folder icon in the VDJbase Samples tab. This file lists the alleles that were determined from the sample. In the Genes tab of VDJbase, the ‘Max cov sample’ column shows, for each allele, the sample in which it was found with greatest supporting read depth. ‘Max coverage’ lists, for this sample, the number of reads that exactly matched the allele and covered its full length. This provides a shortcut way to find a sample that has good supporting statistics for the allele. The annotation file includes read support statistics for each discovered allele. The meaning of these statistics is given in the table below.

Read support statistics are not as yet available for the IGH project P25, as that project was analysed before the code was written.

columnmeaning
REGION_startstart position for the allele in the assembly
REGION_endend position for the allele in the assembly
Total_Positionsnumber of bases that comprise the allele (REGION)
Average_CoverageNumber of reads that overlap the allele divided by the allele length (Total_Positions). Partial read overlaps allowed
Mismatched_PositionsTotal positions for which > 20% of reads do not match the assembly base
Matched_PositionsTotal positions for which > 80% of reads match the assembly base
Position_MismatchesColon-delimited positions along the allele in the assembly. The number is the number of reads that mismatch the assembly
Position_MatchesColon-delimited positions along the allele in the assembly. The number is the number of reads that match the assembly
Percent_AccuracyMatched_Positions /  Total_Positions
Positions_With_At_Least_10x_CoverageNumber of positions with at least 10X coverage
Fully_Spanning_ReadsNumber of HiFi reads that fully span the allele
Fully_Spanning_Reads_100%_MatchNumber of HiFi reads that fully span the allele and perfectly match the allele sequence