Content types
A Python library to map file extensions to MIME types without accessing the file data. #pypackage
A comprehensive Python library to map file extensions to MIME types with **360+ supported formats**. It also provides a CLI for quick lookups right from your terminal. If no known mapping is found, the tool returns `application/octet-stream`. The project is written primarily in Python, distributed under the MIT License license, first published in 2025. Key topics include: content-types, file-extensions, mime, mime-types, python.
content-types ๐๏ธ๐
A comprehensive Python library to map file extensions to MIME types with 360+ supported formats.
It also provides a CLI for quick lookups right from your terminal.
If no known mapping is found, the tool returns application/octet-stream.
Unlike other libraries, this one does not try to access the file
or parse the bytes of the file or stream. It just looks at the extension
which is valuable when you don't have access to the file directly.
For example, you know the filename but it is stored in s3 and you don't want
to download it just to fully inspect the file.
๐ Documentation
Full documentation is hosted at mkennedy.codes/docs/content-types.
There you'll find a searchable API reference
for get_content_type(), the complete extension-to-type mapping, and the shortcut
constants. The quick examples below cover the essentials.
Extensive Format Support
With 360+ file extensions mapped, content-types covers:
- ๐จ Images - Standard formats plus RAW camera files (Canon, Nikon, Sony, Adobe DNG, etc.)
- ๐ต Audio - MP3, FLAC, AAC, MIDI, WMA, ALAC, DSD, and more
- ๐ฌ Video - MP4, MKV, WebM, FLV, and modern codecs
- ๐ฆ Archives - ZIP, TAR, 7Z, RAR, plus modern formats (bz2, xz, zstd, brotli)
- ๐ Documents - PDF, Office formats (DOCX, XLSX, PPTX), OpenDocument
- ๐ป Programming - Python, JavaScript, TypeScript, Rust, Go, Java, C++, Swift, Kotlin, and 25+ languages
- ๐ฌ Data Science - Parquet, Jupyter notebooks, HDF5, Arrow, Pickle, NumPy, R, Stata, SAS, SPSS
- โ๏ธ Configuration - YAML, TOML, JSON, INI, ENV, dotfiles
- ๐ณ DevOps - Dockerfiles, Terraform, Kubernetes configs, Nomad
- ๐จ Creative Suite - Adobe (PSD, InDesign, Premiere, After Effects), CAD files (AutoCAD, SketchUp, Blender)
- ๐ฎ Game Development - Unity, Unreal Engine, PAK files
- ๐ฌ Scientific - FITS, DICOM, NIfTI, PDB (protein data)
- โ๏ธ Blockchain - Solidity, Vyper smart contracts
- ๐๏ธ Databases - SQLite, Access, MySQL files
- ๐ Documentation - Markdown, AsciiDoc, Org-mode, BibTeX
...and much more!
Why not just use Python's built-in mimetypes? Or the excellent python-magic package?
See below.
Installation
Requires Python 3.10 or later.
bashuv pip install content-types
Usage
pythonimport content_types # Forward lookup: filename -> MIME type the_type = content_types.get_content_type("example.jpg") print(the_type) # "image/jpeg" # Works with any supported extension print(content_types.get_content_type("data.parquet")) # "application/vnd.apache.parquet" print(content_types.get_content_type("notebook.ipynb")) # "application/x-ipynb+json" print(content_types.get_content_type("photo.cr2")) # "image/x-canon-cr2" print(content_types.get_content_type("model.blend")) # "application/x-blender" print(content_types.get_content_type("contract.sol")) # "text/x-solidity" # For very common files, you have shortcuts: print(f'Content-Type for webp is {content_types.webp}.') # Content-Type for webp is image/webp. # Data science shortcuts print(content_types.parquet) # "application/vnd.apache.parquet" print(content_types.ipynb) # "application/x-ipynb+json" print(content_types.pkl) # "application/octet-stream" print(content_types.yaml) # "application/yaml" print(content_types.toml) # "application/toml" print(content_types.sqlite) # "application/vnd.sqlite3" # Works with Path objects too from pathlib import Path path = Path("document.pdf") print(content_types.get_content_type(path)) # "application/pdf" # URLs work too โ query strings and fragments are stripped before lookup url = "https://cdn.example.com/song.mp3?cache_id=678c2a" print(content_types.get_content_type(url)) # "audio/mpeg" # Unknown extensions fall back to 'application/octet-stream' by default; # pass treat_as_binary=False to fall back to 'text/plain' instead. print(content_types.get_content_type("notes.unknownext")) # "application/octet-stream" print(content_types.get_content_type("notes.unknownext", treat_as_binary=False)) # "text/plain" # Or supply your own fallback for unknown extensions; it takes precedence # over treat_as_binary. Known extensions still resolve normally. print(content_types.get_content_type("notes.unknownext", fallback="application/x-custom")) # "application/x-custom" # Pass fallback=None to get None back for unknowns (e.g. to branch on a miss). # Omitting fallback keeps the default 'application/octet-stream' โ existing callers are unaffected. print(content_types.get_content_type("notes.unknownext", fallback=None)) # None print(content_types.get_content_type("photo.jpg", fallback=None)) # "image/jpeg" (known, unaffected) # Reverse lookup: MIME type -> extension (the inverse of get_content_type) print(content_types.guess_extension("application/pdf")) # ".pdf" print(content_types.guess_extension("image/jpeg")) # ".jpg" (the canonical pick) print(content_types.guess_all_extensions("image/jpeg")) # ['.jpg', '.jpeg', '.jpe'] # Matching is case-insensitive, and parameters on a Content-Type header are ignored print(content_types.guess_extension("text/html; charset=utf-8")) # ".html" # Common non-canonical / legacy spellings resolve to the canonical type too print(content_types.guess_extension("text/json")) # ".json" (canonical: application/json) print(content_types.guess_extension("image/jpg")) # ".jpg" (canonical: image/jpeg) # Pass with_dot=False for a bare extension; unknown types return None / [] print(content_types.guess_extension("application/toml", with_dot=False)) # "toml" print(content_types.guess_extension("application/x-nope")) # None print(content_types.guess_all_extensions("application/x-nope")) # []
The reverse lookup mirrors the standard library mimetypes.guess_extension /
guess_all_extensions, but draws on this library's larger, more-correct table โ so the
canonical extension can differ (for example, it returns .html for text/html, not .htm).
It is also forgiving of common non-canonical spellings: text/json, image/jpg,
application/javascript, and application/x-zip-compressed all resolve to a sensible extension.
CLI
To use the library as a CLI tool, just install it with uv or pipx.
bashuv tool install content-types
Now it will be available machine-wide.
bashcontent-types example.jpg # Outputs: image/jpeg content-types data.parquet # Outputs: application/vnd.apache.parquet content-types notebook.ipynb # Outputs: application/x-ipynb+json content-types photo.cr2 # Outputs: image/x-canon-cr2
More correct than Python's mimetypes
When I first learned about Python's mimetypes module, I thought it was exactly what I need. However,
it doesn't have all the MIME types. And, it recommends deprecated, out-of-date answers for very obvious types.
For example, mimetypes has .xml as text/xml where it should be application/xml
(see MDN).
And mimetypes is missing important types such as:
- .m4v -> video/mp4
- .tgz -> application/gzip
- .flac -> audio/flac
- .epub -> application/epub+zip
- .parquet -> application/vnd.apache.parquet
- .ipynb -> application/x-ipynb+json
- .mkv -> video/x-matroska
- .toml -> application/toml
- .yaml -> application/yaml
- .rs -> text/x-rust
- .go -> text/x-go
- .tsx -> text/tsx
- .psd -> image/vnd.adobe.photoshop
- .dwg -> application/acad
- ... and 300+ more
With this library, you get 360+ file extensions properly mapped, compared to Python's mimetypes
which only has around 100 and includes outdated MIME types.
Popular Format Examples
Here are some commonly used formats by category:
Data Science & Analytics:
.parquet- Apache Parquet columnar storage.ipynb- Jupyter Notebooks.pkl,.pickle- Python pickle files.npy,.npz- NumPy arrays.arrow,.feather- Apache Arrow.hdf5,.h5- HDF5 scientific data.mat- MATLAB data files.dta- Stata data files.sav- SPSS data files
Modern Programming Languages:
.rs- Rust.go- Go/Golang.ts,.tsx- TypeScript/React.jsx- React JavaScript.vue- Vue.js components.swift- Swift.kt,.kts- Kotlin.dart- Dart.sol- Solidity (smart contracts)
Configuration & Infrastructure:
.yaml,.yml- YAML configs.toml- TOML configs.env- Environment variables.dockerfile- Docker files.tf,.tfvars- Terraform.ini,.conf,.cfg- Configuration files
Creative & Design:
.psd,.psb- Adobe Photoshop.indd- Adobe InDesign.aep- Adobe After Effects.dwg,.dxf- AutoCAD.skp- SketchUp.blend- Blender.cr2,.cr3- Canon RAW.nef- Nikon RAW.dng- Adobe DNG RAW
Modern Media:
.mkv- Matroska video.webp- WebP images.avif- AVIF images.opus- Opus audio.flac- FLAC audio.midi,.mid- MIDI
Works when python-magic package doesn't
Why not the excellent python-magic package? That one works by reading the header bytes of
binary files which requires access to the file data. The whole goal of this project is
to avoid accessing or needing the file data. They are for different use-cases.
Contributing
Contributions are welcome! Check out the GitHub repo
for more details on how to get involved.
Development
pytest and ruff aren't declared dependencies โ uv provides them on the fly:
bash# Run the test suite (67 tests) uv run --with pytest pytest # Lint and format (config in ruff.toml) uvx ruff check . uvx ruff format .
Building the docs
The docs site is built with Great Docs and
published at mkennedy.codes/docs/content-types.
Great Docs imports the package for API introspection, so the toolchain lives in the dev
extra and needs an editable install:
bash# Install the docs toolchain into your virtualenv uv pip install -e ".[dev]" # Build the site (mirrors great-docs/_site/ into the committed docs/ folder) python scripts/build_docs.py # Preview exactly as hosted, under the /docs/content-types subpath python scripts/serve_docs.py # -> http://127.0.0.1:8099/docs/content-types/
Contributors
Showing top 4 contributors by commit count.
