import_hgnc_dataset {hgnc} | R Documentation |
Import HGNC data
Description
import_hgnc_dataset()
imports HGNC data into R. Specify a directory path
in addition if you wish the save the data to disk.
Usage
import_hgnc_dataset(file = latest_archive_url())
Arguments
file |
A file or URL of the complete HGNC data set (in TSV format).
Use |
Value
A tibble of the HGNC data set consisting of 55 columns:
-
hgnc_id
: A unique ID provided by HGNC for each gene with an approved symbol. IDs are of the format'HGNC:n'
, wheren
is a unique number. HGNC IDs remain stable even if a name or symbol changes. -
hgnc_id2
: A stripped down version ofhgnc_id
where the prefix'HGNC:'
has been removed. This column is added by the package{hgnc}
. -
symbol
: The official gene symbol approved by the HGNC, typically a short form of the gene name. Symbols are approved in accordance with the Guidelines for Human Gene Nomenclature. -
name
: The full gene name approved by the HGNC; corresponds to the approved symbol above. -
locus_group
: A group name for a set of related locus types as defined by the HGNC. One of:'protein-coding gene'
,'non-coding RNA'
,'pseudogene'
or'other'
. -
locus_type
: Specifies the genetic class of each gene entry, including various types of RNA and other gene-related categories, such as pseudogenes and virus integration sites. -
status
: Status of the symbol report, which can be either'Approved'
or'Entry Withdrawn'
. -
location
: Chromosomal location. Indicates the cytogenetic location of the gene or region on the chromosome, e.g.,'19q13.43'
. In the absence of that information, it may be listed as'not on reference assembly'
,'unplaced'
, or'reserved'
. -
location_sortable
: A sortable version of thelocation
column, allowing easier sorting by chromosomal location. -
alias_symbol
: Alternative symbols that have been used to refer to the gene. Aliases may be from literature, other databases, or represent membership of a gene group. -
alias_name
: Alternative names for the gene. Aliases may be from literature, other databases, or represent membership of a gene group. -
prev_symbol
: This field displays any symbols that were previously HGNC-approved nomenclature. -
prev_name
: This field displays any names that were previously HGNC-approved nomenclature. -
gene_group
: A gene group. Each gene has been assigned to one or more groups, according to either sequence similarity or information from publications, specialist advisors, or other databases. -
gene_group_id
: Gene group identifier, an integer number. This column contains the gene group identifiers. Seegene_group
for the gene group name. -
date_approved_reserved
: The date the entry was first approved. -
date_symbol_changed
: The date the gene symbol was last changed. -
date_name_changed
: The date the gene name was last changed. -
date_modified
: Date the entry was last modified. -
entrez_id
: Entrez gene identifier. -
ensembl_gene_id
: Ensembl gene identifier. -
vega_id
: VEGA gene identifier. -
ucsc_id
: UCSC gene identifier. -
ena
: International Nucleotide Sequence Database Collaboration (GenBank, ENA and DDBJ) accession number(s). -
refseq_accession
: The Reference Sequence (RefSeq) identifier for that entry, provided by the NCBI. -
ccds_id
: Consensus CDS identifier. -
uniprot_ids
: UniProt protein accession. -
pubmed_id
: Pubmed and Europe Pubmed Central PMIDs. -
mgd_id
: Mouse genome informatics database identifier. -
rgd_id
: Rat genome database gene identifier. -
lsdb
: The name of the Locus Specific Mutation Database and URL for the gene. -
cosmic
: Symbol used within the Catalogue of somatic mutations in cancer for the gene. -
omim_id
: Online Mendelian Inheritance in Man (OMIM) identifier. -
mirbase
: miRBase identifier. -
homeodb
: Homeobox Database identifier. -
snornabase
: snoRNABase identifier. -
bioparadigms_slc
: Symbol used to link to the SLC tables database at bioparadigms.org for the gene. -
orphanet
: Orphanet identifier. -
pseudogene_org
: Pseudogene.org identifier. -
horde_id
: Symbol used within HORDE for the gene. -
merops
: Identifier used to link to the MEROPS peptidase database. -
imgt
: Symbol used within international ImMunoGeneTics information system. -
iuphar
: The objectId used to link to the IUPHAR/BPS Guide to PHARMACOLOGY database. -
kznf_gene_catalog
: Lawrence Livermore National Laboratory Human KZNF Gene Catalog (LLNL) identifier. -
mamit_trnadb
: Identifier to link to the Mamit-tRNA database. -
cd
: Symbol used within the Human Cell Differentiation Molecule database for the gene. -
lncrnadb
: lncRNA Database identifier. -
enzyme_id
: ENZYME EC accession number. -
intermediate_filament_db
: Identifier used to link to the Human Intermediate Filament Database. -
rna_central_ids
: Identifier in the RNAcentral, The non-coding RNA sequence database. -
lncipedia
: The LNCipedia identifier to which the gene belongs. This will only appear if the gene is a long non-coding RNA. -
gtrnadb
: The GtRNAdb identifier to which the gene belongs. This will only appear if the gene is a tRNA. -
agr
: The Alliance of Genomic Resources HGNC ID for the Human gene page within the resource. -
mane_select
: MANE Select nucleotide accession with version (i.e., NCBI RefSeq or Ensembl transcript ID and version). -
gencc
: Gene Curation Coalition (GenCC) Database identifier.
Examples
## Not run: import_hgnc_dataset()