Search
Duplicate
🌄

pybiomart 사용법

Documentation

Install

conda install -c bioconda pybiomart
Shell
복사

Quickstart

from pybiomart import Dataset dataset = Dataset(name='hsapiens_gene_ensembl', host='http://www.ensembl.org') res = dataset.query(attributes=['ensembl_gene_id', 'external_gene_name']) server = Server(host='http://www.ensembl.org') server.list_marts() mart = server['ENSEMBL_MART_ENSEMBL'] mart.list_datasets() dataset = mart['hsapiens_gene_ensembl'] # Datset의 attribute를 얻습니다. dataset.list_attributes()
Python
복사

Mapping from ensg to symbol (Protein-coding gene only)

from pybiomart import Dataset dataset = Dataset(name='hsapiens_gene_ensembl', host='http://www.ensembl.org') res = dataset.query(attributes=['ensembl_gene_id', 'external_gene_name', 'gene_biotype']) ensg2symbol = {r['Gene stable ID']:r['Gene name'] for r in res[res['Gene type'] == 'protein_coding'].to_records()}
Python
복사

Mapping from ensg to symbol

from pybiomart import Dataset dataset = Dataset(name='hsapiens_gene_ensembl', host='http://www.ensembl.org') res = dataset.query(attributes=['ensembl_gene_id', 'external_gene_name', 'gene_biotype']) ensg2symbol = {r['Gene stable ID']:r['Gene name'] for r in res.to_records()}
Python
복사

Mapping from ensg to symbol (mouse)

from pybiomart import Dataset dataset = Dataset(name='mmusculus_gene_ensembl', host='http://www.ensembl.org') res = dataset.query(attributes=['ensembl_gene_id', 'external_gene_name', 'gene_biotype']) ensg2symbol = {r['Gene stable ID']:r['Gene name'] for r in res.to_records()}
Shell
복사

Entrez ID to symbol

dataset = Dataset(name='hsapiens_gene_ensembl', host='http://www.ensembl.org') res = dataset.query(attributes=['external_gene_name', 'entrezgene_id']).dropna() entrez2symbol = {str(int(r['NCBI gene (formerly Entrezgene) ID'])):r['Gene name'] for r in res.to_records()}
Shell
복사