OGRDB – Depositing records in NIH repositories

At the end of this deposition process, NCBI will contain two types of record for use in your submission:

A sequence record for each inferred sequence you intend to submit.
One or more select sets for each inferred sequence.

The deposition process for each type is detailed below. As sequence records refer to select sets, you should create the select sets first.

Please note that, if you or your lab did not originate the repertoires on which inferences are based, NCBI will require a peer-reviewed publication specifically describing your inferences before publishing the sequence records. ENA does not appear to have this restriction and it is not required by IARC for submission, it is an NCBI requirement. Please also note that the submission of these records is relatively novel to NCBI and you may therefore get some questions from them during submission. Notes at the foot of this page provide context for you should that happen.

Select Set

The select set should be uploaded to the SRA. The metadata context, i.e. links to the repertoire sequencing project, sample and (if possible) experiment) should be maintained. The record should be titled “Reads from supporting inference of <species> <chain> gene” and contain a design description, e.g., “Experimental workflow as described in original SRA record []. Gene inference was performed using <tool, version>. The reported reads were selected based on 96% identity match.” A sample record can be found under accession SRR8298737.

Sequence Record

The steps to follow vary, depending on whether you (or your lab) created the repertoires deposited in the SRA.

You (or your lab) created the repertoire(s)

In this case, the sequence records should be created in Genbank. For simplicity, if possible please use the NCBI account used to deposit the repertoire(s).

The Genbank record must link to the select set record via the DBLINK/DR field. Note that the DBLINK field does not appear to be available through the BankIt submission interface. You can use Tbl2asn and Sequin, and edit the DBLINK field manually (as “Sequence Read Archive” is not one of the options on the template creation page. A sample Genbank deposit can be found under accession MK321694.

You (or your lab) did not create the repertoire(s)

In this case, the sequence record(s) are submitted as Third Party Annotations (TPA). NCBI requires TPAs to be supported by a peer-reviewed publication. The publication describing the repertoires on which the inferences are based is not sufficient: this must be a publication describing the inferences made from the repertoires. This requirement appears to be specific to NCBI and it is not a requirement for OGRDB submission that such a publication exists.

The TPA record must link to the associated select set records via the DBLINK/DR field. A sample TPA deposit can be found under accession BK01573.

Note

It is reasonably likely, in the short term, that you will encounter questions from SRA/Genbank staff about the nature of these deposits. If so, you can respond that they are made as part of a community effort to document novel alleles with an emphasis on transparency in data provenance. You can link to the IARC page and note that we worked together with IMGT and Genbank/TPA staff in designing this procedure.