The annotation file for a genomic sample can be found by clicking on the folder icon in the VDJbase Samples tab. This file lists the alleles that were determined from the sample. In the Genes tab of VDJbase, the ‘Max cov sample’ column shows, for each allele, the sample in which it was found with greatest supporting read depth. ‘Max coverage’ lists, for this sample, the number of reads that exactly matched the allele and covered its full length. This provides a shortcut way to find a sample that has good supporting statistics for the allele. The annotation file includes read support statistics for each discovered allele. The meaning of these statistics is given in the table below.
Read support statistics are not as yet available for the IGH project P25, as that project was analysed before the code was written.
column | meaning |
REGION_start | start position for the allele in the assembly |
REGION_end | end position for the allele in the assembly |
Total_Positions | number of bases that comprise the allele (REGION) |
Average_Coverage | Number of reads that overlap the allele divided by the allele length (Total_Positions). Partial read overlaps allowed |
Mismatched_Positions | Total positions for which > 20% of reads do not match the assembly base |
Matched_Positions | Total positions for which > 80% of reads match the assembly base |
Position_Mismatches | Colon-delimited positions along the allele in the assembly. The number is the number of reads that mismatch the assembly |
Position_Matches | Colon-delimited positions along the allele in the assembly. The number is the number of reads that match the assembly |
Percent_Accuracy | Matched_Positions / Total_Positions |
Positions_With_At_Least_10x_Coverage | Number of positions with at least 10X coverage |
Fully_Spanning_Reads | Number of HiFi reads that fully span the allele |
Fully_Spanning_Reads_100%_Match | Number of HiFi reads that fully span the allele and perfectly match the allele sequence |