Read support for genomic alleles

The annotation file for a genomic sample can be found by clicking on the folder icon in the VDJbase Samples tab. This file lists the alleles that were determined from the sample. In the Genes tab of VDJbase, the ‘Max cov sample’ column shows, for each allele, the sample in which it was found with greatest supporting read depth. ‘Max coverage’ lists, for this sample, the number of reads that exactly matched the allele and covered its full length. This provides a shortcut way to find a sample that has good supporting statistics for the allele. The annotation file includes read support statistics for each discovered allele. The meaning of these statistics is given in the table below.

Read support statistics are not as yet available for the IGH project P25, as that project was analysed before the code was written.

column	meaning
REGION_start	start position for the allele in the assembly
REGION_end	end position for the allele in the assembly
Total_Positions	number of bases that comprise the allele (REGION)
Average_Coverage	Number of reads that overlap the allele divided by the allele length (Total_Positions). Partial read overlaps allowed
Mismatched_Positions	Total positions for which > 20% of reads do not match the assembly base
Matched_Positions	Total positions for which > 80% of reads match the assembly base
Position_Mismatches	Colon-delimited positions along the allele in the assembly. The number is the number of reads that mismatch the assembly
Position_Matches	Colon-delimited positions along the allele in the assembly. The number is the number of reads that match the assembly
Percent_Accuracy	Matched_Positions / Total_Positions
Positions_With_At_Least_10x_Coverage	Number of positions with at least 10X coverage
Fully_Spanning_Reads	Number of HiFi reads that fully span the allele
Fully_Spanning_Reads_100%_Match	Number of HiFi reads that fully span the allele and perfectly match the allele sequence