Statistics & downloads help
Statistics & files
The Statistics & downloads page contains tables with breakdown statistics by locus group and locus type of the number of approved symbol reports we have within the database. The tables also contain icons shown below, which enable users to download the data in text (tsv) or JSON format.
The icons are as follows:
- Tab delimited text file. Multple valued fields are double quoted and delimited by | within the quotes. The format of the file should be easily viewable within a spreadsheet application such as excel.
- JSON text file (no indentation or white space). Intended for loading into a JSON parser within a script or program
- Link to the Custom downloads page for the locus type/group where users can specify exactly what data they wish to download.
Above the tables there is a drop down menus that allow you to select a specific chromosome which will change the table statistics to show the data for the selected chromosome.
Beneath the tables we also have text (tsv) and JSON files for our complete VGNC dataset, our gene groups dataset and our locus specific database links set.
Fields within the tsv and JSON files
- taxon_id
- Taxon ID. The taxonomy identitifier for the species the gene belongs to. This taxon ID is taken from the NCBI taxonomy browser.
- vgnc_id
- VGNC ID. A unique ID created by the VGNC for every approved symbol.
- symbol
- The VGNC approved gene symbol. Equates to the "Approved symbol" field within the gene symbol report.
- name
- VGNC approved name for the gene. Equates to the "Approved name" field within the gene symbol report.
- locus_group
- A group name for a set of related locus types as defined by the VGNC (e.g. pseudogene).
- locus_type
- The locus type as set by the VGNC.
- status
- Status of the symbol report, which can be either "Approved" or "Entry Withdrawn".
- location
- Cytogenetic location of the gene (e.g. 2q34).
- location_sortable
- Same as "location" but single digit chromosomes are prefixed with a 0 enabling them to be sorted in correct numerical order (e.g. 02q34).
- alias_symbol
- Other symbols used to refer to this gene as seen in the "Alias symbols" field in the gene symbol report.
- alias_name
- Other names used to refer to this gene as seen in the "Alias names" field in the gene symbol report.
- prev_symbol
- Gene symbols previously approved by the VGNC for this gene. Equates to the "Previous symbols" field within the gene symbol report.
- prev_name
- Gene names previously approved by the VGNC for this gene. Equates to the "Previous names" field within the gene symbol report.
- gene_group
- The gene group name as set by the VGNC and seen at the top of the gene group reports.
- gene_group_id
- ID used to designate a gene group the gene has been assigned to.
- date_approved_reserved
- The date the entry was first approved.
- date_symbol_changed
- The date the approved symbol was last changed.
- date_name_changed
- The date the approved name was last changed.
- date_modified
- Date the entry was last modified.
- ensembl_gene_id
- Ensembl gene ID. Found within the "Gene resources" section of the gene symbol report.
- ncbi_id
- NCBI gene ID. Found within the "Gene resources" section of the gene symbol report.
- uniprot_ids
- UniProt protein accession. Found within the "Protein resource" section of the gene symbol report.
- pubmed_id
- Pubmed and Europe Pubmed Central PMID(s).
- bgd_id
- Symbol used within the Bovine Genome Database for the gene
- horde_id
- Symbol used within HORDE for the gene
- hgnc_orthologs
- HGNC database IDs. Found within the "Orthologs" section of the gene symbol report.