GitPedia

Ncbitax2lin

๐Ÿž Convert NCBI taxonomy dump into lineages

From zyxueยทUpdated June 3, 2026ยทView on GitHubยท

Convert NCBI taxonomy dump into lineages. An example for [human (tax_id=9606)](https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=9606) is like The project is written primarily in Python, distributed under the MIT License license, first published in 2016. Key topics include: lineage, ncbi, ncbi-taxonomy, pandas, python.

NCBItax2lin

Downloads

Convert NCBI taxonomy dump into lineages. An example for human
(tax_id=9606)

is like

tax_idsuperkingdomphylumclassorderfamilygenusspeciesfamily1formagenus1infraclassinfraorderkingdomno rankno rank1no rank10no rank11no rank12no rank13no rank14no rank15no rank16no rank17no rank18no rank19no rank2no rank20no rank21no rank22no rank3no rank4no rank5no rank6no rank7no rank8no rank9parvorderspecies groupspecies subgroupspecies1subclasssubfamilysubgenussubkingdomsubordersubphylumsubspeciessubtribesuperclasssuperfamilysuperordersuperorder1superphylumtribevarietas
9606EukaryotaChordataMammaliaPrimatesHominidaeHomoHomo sapiensSimiiformesMetazoacellular organismsOpisthokontaDipnotetrapodomorphaTetrapodaAmniotaTheriaEutheriaBoreoeutheriaEumetazoaBilateriaDeuterostomiaVertebrataGnathostomataTeleostomiEuteleostomiSarcopterygiiCatarrhiniHomininaeHaplorrhiniCraniataHominoideaEuarchontoglires

Install

ncbitax2lin supports python-3.9 to python-3.13.

pip install -U ncbitax2lin

It is also available in Conda on the Bioconda channel:

conda install bioconda::ncbitax2lin

Generate lineages

First download taxonomy dump from NCBI:

bash
wget -N ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdump.tar.gz mkdir -p taxdump && tar zxf taxdump.tar.gz -C ./taxdump

Then, run ncbitax2lin

bash
ncbitax2lin --nodes-file taxdump/nodes.dmp --names-file taxdump/names.dmp

By default, the generated lineages will be saved to
ncbi_lineages_[date_of_utcnow].csv.gz. The output file can be overwritten with
--output option.

FAQ

Q: I have a large number of sequences with their corresponding accession
numbers from NCBI, how to get their lineages?

A: First, you need to map accession numbers (GI is deprecated) to tax IDs
based on nucl_*accession2taxid.gz files from
ftp://ftp.ncbi.nih.gov/pub/taxonomy/accession2taxid/. Secondly, you can trace a
sequence's whole lineage based on its tax ID. The tax-id-to-lineage mapping is
what NCBItax2lin can generate for you.

If you have any question about this project, please feel free to create a new
issue.

Note on taxdump.tar.gz.md5

It appears that NCBI periodically regenerates taxdump.tar.gz and
taxdump.tar.gz.md5 even when its content is still the same. I am not sure how
their regeneration works, but taxdump.tar.gz.md5 will differ simply because
of a different timestamp.

Used in

  • Mahmoudabadi, G., & Phillips, R. (2018). A comprehensive and quantitative exploration of thousands of viral genomes. ELife, 7. https://doi.org/10.7554/eLife.31955
  • Dombrowski, N. et al. (2020) Undinarchaeota illuminate DPANN phylogeny and the impact of gene transfer on archaeal evolution, Nature Communications. Springer US, 11(1). doi: 10.1038/s41467-020-17408-w. https://www.nature.com/articles/s41467-020-17408-w
  • Schenberger Santos, A. R. et al. (2020) NAD+ biosynthesis in bacteria is controlled by global carbon/ nitrogen levels via PII signaling, Journal of Biological Chemistry, 295(18), pp. 6165โ€“6176. doi: 10.1074/jbc.RA120.012793. https://www.sciencedirect.com/science/article/pii/S0021925817482433
  • Villada, J. C., Duran, M. F. and Lee, P. K. H. (2020) Interplay between Position-Dependent Codon Usage Bias and Hydrogen Bonding at the 5' End of ORFeomes, mSystems, 5(4), pp. 1โ€“18. doi: 10.1128/msystems.00613-20. https://msystems.asm.org/content/5/4/e00613-20
  • Byadgi, O. et al. (2020) Transcriptome analysis of amyloodinium ocellatum tomonts revealed basic information on the major potential virulence factors, Genes, 11(11), pp. 1โ€“12. doi: 10.3390/genes11111252. https://www.mdpi.com/2073-4425/11/11/1252
  • Cumbo, F., & Blankenberg, D. (2025). Characterization of microbial dark matter at scale with MetaSBT and taxonomy-aware Sequence Bloom Trees. bioRxiv. https://doi.org/10.1101/2025.08.25.672238

Development

Install dependencies

poetry install --sync

Testing

make format
make all

Publish (only for administrator)

poetry version [minor/major etc.]
git tag vx.y.z
git push origin vx.y.z
poetry publish --build -u __token__ --password pypi-<token-from-pypi>

Update CHANGELOG.md.

Contributors

Showing top 2 contributors by commit count.

View all contributors on GitHub โ†’

This article is auto-generated from zyxue/ncbitax2lin via the GitHub API.Last fetched: 6/20/2026