Symbol report help
Each gene with an approved VGNC symbol has its own Symbol Report that contains our manually curated data and links to other external biomedical resources. The VGNC "core data" is displayed at the top of the page in a separate box and presents the approved nomenclature, the unique VGNC ID number, aliases, previous nomenclature, locus type, chromosomal location and gene group. The table below the VGNC "core data" provides links to external resources such as homologs in other species, gene resources, protein resources, and publications.
Tags
The text that follows is a field-by-field guide to the information provided in the Symbol Report.
Core Data fields
Approved symbol
The official gene symbol approved by the VGNC, which is typically a short form of the gene name. Symbols are approved in accordance with the Guidelines for Human Gene Nomenclature (please refer to the guidelines page on the HGNC website).
VGNC ID
A unique ID provided by the VGNC for each gene with an approved symbol. IDs are of the format VGNC:n, where n is a unique number. VGNC IDs remain stable even if a name or symbol changes.
Species
The Genbank common name for the species this gene belongs to as assigned by the NCBI.
Taxonomy ID
The taxonomy identitifier for this species this gene belongs to. This taxon ID is taken from the NCBI taxonomy browser.
Alias symbols
Alternative symbols that have been used to refer to the gene. Aliases may be from literature, from other databases or may be added to represent membership of a gene group.
Alias names
Alternative names for the gene. Aliases may be from literature, from other databases or may be added to represent membership of a gene group.
Locus type
Specifies the genetic class of each gene entry. All VGNC locus types are listed below:
- gene with protein product - protein-coding genes (the protein may be predicted and of unknown function) (SO:0001217)
- RNA, cluster - region containing a cluster of small non-coding RNA genes
- RNA, long non-coding - non-protein coding genes that encode long non-coding RNAs (lncRNAs) (SO:0001877); these are at least 200 nt in length. Subtypes include intergenic (SO:0001463), intronic (SO:0001903) and antisense (SO:0001904).
- RNA, micro - non-protein coding genes that encode microRNAs (miRNAs) (SO:0001265)
- RNA, ribosomal - non-protein coding genes that encode ribosomal RNAs (rRNAs) (SO:0001637)
- RNA, small nuclear - non-protein coding genes that encode small nuclear RNAs (snRNAs) (SO:0001268)
- RNA, small nucleolar - non-protein coding genes that encode small nucleolar RNAs (snoRNAs) containing C/D or H/ACA box domains (SO:0001267)
- RNA, small cytoplasmic - non-protein coding genes that encode small cytoplasmic RNAs (scRNAs) (SO:0001266)
- RNA, transfer - non-protein coding genes that encode transfer RNAs (tRNAs) (SO:0001272)
- RNA, small misc - non-protein coding genes that encode miscellaneous types of small ncRNAs, such as vault (SO:0000404) and Y (SO:0000405) RNA genes
- pseudogene - genomic DNA sequences that are similar to protein-coding genes but do not encode a functional protein (SO:0000336)
- complex locus constituent - transcriptional unit that is part of a named complex locus
- endogenous retrovirus - integrated retroviral elements that are transmitted through the germline (SO:0000100)
- fragile site - a heritable locus on a chromosome that is prone to DNA breakage
- immunoglobulin gene - gene segments that undergo somatic recombination to form heavy or light chain immunoglobulin genes (SO:0000460). Also includes immunoglobulin gene segments with open reading frames that either cannot undergo somatic recombination, or encode a peptide that is not predicted to fold correctly; these are identified by inclusion of the term "non-functional" in the gene name.
- immunoglobulin pseudogene - immunoglobulin gene segments that are inactivated due to frameshift mutations and/or stop codons in the open reading frame
- protocadherin - gene segments that constitute the three clustered protocadherins (alpha, beta and gamma)
- readthrough - a naturally occurring transcript containing coding sequence from two or more genes that can also be transcribed individually
- region - extents of genomic sequence that contain one or more genes, also applied to non-gene areas that do not fall into other types
- T cell receptor gene - gene segments that undergo somatic recombination to form either alpha, beta, gamma or delta chain T cell receptor genes (SO:0000460). Also includes T cell receptor gene segments with open reading frames that either cannot undergo somatic recombination, or encode a peptide that is not predicted to fold correctly; these are identified by inclusion of the term "non-functional" in the gene name.
- T cell receptor pseudogene - T cell receptor gene segments that are inactivated due to frameshift mutations and/or stop codons in the open reading frame
- transposable element - a segment of repetitive DNA that can move, or retrotranspose, to new sites within the genome (SO:0000101)
- unknown - entries where the locus type is currently unknown
- virus integration site - target sequence for the integration of viral DNA into the genome
Chromosomal location
Indicates the cytogenetic location of the gene or region on the chromsome. In the absence of that information one of the following may be listed:
- not on reference assembly - named gene is not annotated on the current version of the Genome Reference Consortium human reference assembly; may have been annotated on previous assembly versions or on a non-reference human assembly
- unplaced - named gene is annotated on an unplaced/unlocalized scaffold of the human reference assembly
- reserved - named gene has never been annotated on any human assembly
Gene groups
Links to VGNC-curated gene group pages. Each link is to the relevant gene group the gene has been assigned to, according to either sequence similarity or information from publications, specialist advisors for that group or other databases. Groups may be either structural or functional; note that a gene may belong to more than one group.
All other symbol report data fields
Specialist resources
This section only appears on Symbol Reports if the gene in question is listed in an external database which is specific to certain classes of genes. A full list is provided here:
- HORDE - Human Olfactory Receptor Data Exploratorium
Gene resources
Provides links to external pages dedicated to information on the gene and to genome browsers. Links are to the following pages:
- The NCBI Gene page at the NCBI provides curated sequence and descriptive information about genetic loci including official nomenclature, aliases, sequence accessions, phenotypes, EC numbers, MIM numbers, UniGene clusters, homology, map locations, and related web sites. There is also a link to the gene annotation at the NCBI Sequence Viewer, the graphical display for the NCBI Nucleotide and Protein databases.
- The Ensembl Gene View displays data associated at the gene level such as orthologs, paralogs, regulatory regions and splice variants. There is a link to the gene annotation at the Ensembl Genome Browser.
Orthologs from selected species
This section contains links to orthologs of the gene in selected species. The table contains the following:
- Human orthologs that link to the HGNC resource. Human gene symbols are approved by the HUGO Genomic Nomenclature Committee (HGNC).
- Other vertebrate orthologs named by the Vertebrate Gene Nomenclature Committee (VGNC).
Gene resources
Provides links to external pages dedicated to information on the gene and to genome browsers. Links are to the following pages:
- The NCBI Gene page at the NCBI provides curated sequence and descriptive information about genetic loci including official nomenclature, aliases, sequence accessions, phenotypes, EC numbers, MIM numbers, UniGene clusters, homology, map locations, and related web sites. There is also a link to the gene annotation at the NCBI Sequence Viewer, the graphical display for the NCBI Nucleotide and Protein databases.
- The Ensembl Gene View displays data associated at the gene level such as orthologs, paralogs, regulatory regions and splice variants. There is a link to the gene annotation at the Ensembl Genome Browser.
Protein resources
Information on proteins encoded by the gene in question. Links are made via UniProt protein accessions. There are four possible links per Symbol Report:
- The UniProt page for the encoded gene protein product. The UniProt Protein Knowledgebase is a curated protein sequence database that provides a high level of annotation, a minimal level of redundancy and high level of integration with other databases. We do not map to TrEMBL entries within UniProt, only to Swiss-Prot entries as these are manually annotated and reviewed.
- The InterPro page mapped to the displayed UniProt protein accession. InterPro is an integrated database of predictive protein "signatures" used for the classification and automatic annotation of proteins and genomes.
- The PDBe page mapped to the displayed UniProt accession. PDBe is a founding member of the Worldwide Protein Data Bank which collects, organises and disseminates data on biological macromolecular structures.
- The Reactome protein-level page mapped to the displayed UniProt protein accession. Reactome is an manually curated and peer-reviewed pathway database.
References
Displays the title, (first) author, journal information and links to PubMed and Europe PubMed Central. The abstract and full list of authors can also be viewed by clicking on the '+' icon next to the links. This section aims to reference a limited number of key papers that describe the gene and/or its products, or are particularly relevant to its nomenclature and/or function; it does not aim to be an exhaustive bibliography.