dhondta/python-codext — Gitpedia

<p align="center" id="top"><img src="https://github.com/dhondta/python-codext/raw/main/docs/pages/img/logo.png"></p> <h1 align="center">CodExt <a href="https://twitter.com/intent/tweet?text=CodExt%20-%20Encoding%2Fdecoding%20anything.%0D%0APython%20library%20extending%20the%20native%20codecs%20library%20with%20many%20new%20encodings%20and%20providing%20CLI%20tools%20with%20a%20guess%20feature%20based%20on%20AI.%0D%0Ahttps%3a%2f%2fgithub%2ecom%2fdhondta%2fpython-codext%0D%0A&hashtags=python,programming,encodings,codecs,cryptography,morse,base,ctftools"><img src="https://img.shields.io/badge/Tweet--lightgrey?logo=twitter&style=social" alt="Tweet" height="20"/></a></h1> <h3 align="center">Encode/decode anything.</h3>

CodExt is a (Python2-3 compatible) library that extends the native codecs library (namely for adding new custom encodings and character mappings) and provides 120+ new codecs, hence its name combining CODecs EXTension. It also features a guess mode for decoding multiple layers of encoding and CLI tools for convenience.

sh
$ pip install codext

Want to contribute a new codec ?	Want to contribute a new macro ?
Check the documentation first<br>Then PR your new codec	PR your updated version of `macros.json`

:mag: Demonstrations

:computer: Usage (main CLI tool) <a href="https://twitter.com/intent/tweet?text=CodExt%20-%20Encode%2Fdecode%20anything.%0D%0APython%20tool%20for%20encoding%20and%20decoding%20almost%20anything,%20including%20a%20guess%20feature%20based%20on%20AI.%0D%0Ahttps%3a%2f%2fgithub%2ecom%2fdhondta%2fpython-codext%0D%0A&hashtags=python,encodings,codecs,cryptography,morse,base,stegano,steganography,ctftools"><img src="https://img.shields.io/badge/Tweet%20(codext)--lightgrey?logo=twitter&style=social" alt="Tweet on codext" height="20"/></a>

session
$ codext -i test.txt encode dna-1
GTGAGCGGGTATGTGA

$ echo -en "test" | codext encode morse
- . ... -

$ echo -en "test" | codext encode braille
⠞⠑⠎⠞

$ echo -en "test" | codext encode base100
👫👜👪👫

:chains: Chaining codecs

sh
$ echo -en "Test string" | codext encode reverse
gnirts tseT

$ echo -en "Test string" | codext encode reverse morse
--. -. .. .-. - ... / - ... . -

$ echo -en "Test string" | codext encode reverse morse dna-2
AGTCAGTCAGTGAGAAAGTCAGTGAGAAAGTGAGTGAGAAAGTGAGTCAGTGAGAAAGTCAGAAAGTGAGTGAGTGAGAAAGTTAGAAAGTCAGAAAGTGAGTGAGTGAGAAAGTGAGAAAGTC

$ echo -en "Test string" | codext encode reverse morse dna-2 octal
101107124103101107124103101107124107101107101101101107124103101107124107101107101101101107124107101107124107101107101101101107124107101107124103101107124107101107101101101107124103101107101101101107124107101107124107101107124107101107101101101107124124101107101101101107124103101107101101101107124107101107124107101107124107101107101101101107124107101107101101101107124103

$ echo -en "AGTCAGTCAGTGAGAAAGTCAGTGAGAAAGTGAGTGAGAAAGTGAGTCAGTGAGAAAGTCAGAAAGTGAGTGAGTGAGAAAGTTAGAAAGTCAGAAAGTGAGTGAGTGAGAAAGTGAGAAAGTC" | codext -d dna-2 morse reverse
test string

:twisted_rightwards_arrows: Using macros

sh
$ codext add-macro my-encoding-chain gzip base63 lzma base64

$ codext list macros
example-macro, my-encoding-chain

$ echo -en "Test string" | codext encode my-encoding-chain
CQQFAF0AAIAAABuTgySPa7WaZC5Sunt6FS0ko71BdrYE8zHqg91qaqadZIR2LafUzpeYDBalvE///ug4AA==

$ codext remove-macro my-encoding-chain

$ codext list macros
example-macro

:desktop_computer: Usage (`baseXX` CLI tools) <a href="https://twitter.com/intent/tweet?text=UnBase%20-%20Decode%20any%20multi-layer%20base-encoded%20string.%0D%0APython%20tool%20for%20decoding%20any%20base-encoded%20string,%20even%20when%20encoded%20with%20multiple%20layers.%0D%0Ahttps%3a%2f%2fgithub%2ecom%2fdhondta%2fpython-codext%0D%0A&hashtags=python,base,encodings,codecs,cryptography,stegano,steganography,ctftools"><img src="https://img.shields.io/badge/Tweet%20(unbase)--lightgrey?logo=twitter&style=social" alt="Tweet on unbase" height="20"/></a>

Playing with base encodings.

session
$ echo "Test string !" | base122
*.7!ft9�-f9Â

$ echo "Test string !" | base91 
"ONK;WDZM%Z%xE7L

$ echo "Test string !" | base91 | base85
B2P|BJ6A+nO(j|-cttl%

$ echo "Test string !" | base91 | base85 | base36 | base58-flickr
QVx5tvgjvCAkXaMSuKoQmCnjeCV1YyyR3WErUUErFf

$ echo "Test string !" | base91 | base85 | base36 | base58-flickr | base58-flickr -d | base36 -d | base85 -d | base91 -d
Test string !

session
$ echo "Test string !" | base91 | base85 | base36 | base58-flickr | unbase -m 3
Test string !

$ echo "Test string !" | base91 | base85 | base36 | base58-flickr | unbase -f Test
Test string !

:computer: Usage (CLI)

Listing codecs.

session
$ codext list encodings
a1z26                      adler32               affine             alternative-rot        ascii           
atbash                     autoclave             bacon              barbie                 base            
base1                      base2                 base3              base4                  base8           
<<snipped>>

Finding a codec based on a name.

session
$ codext search bitcoin
base58

Encoding a string.

sesssion
$ echo -en "This is a test" | codext encode polybius
44232443 2443 11 44154344

Encoding a file.

session
$ echo -en "this is a test" > to_be_encoded.txt
$ codext encode base64 < to_be_encoded.txt > text.b64
$ cat text.b64 
dGhpcyBpcyBhIHRlc3Q=

Chaining codecs.

session
$ echo -en "mrdvm6teie6t2cq=" | codext encode upper | codext decode base32 | codext decode base64
test

Iteratively guessing decodings.

session
$ echo -en "test" | codext encode base64 gzip | codext guess
Codecs: gzip
dGVzdA==
$ echo -en "test" | codext encode base64 gzip | codext guess gzip -i base
Codecs: gzip, base64
test

:snake: Usage (Python)

Getting the list of available codecs.

python
>>> import codext

>>> codext.list()
['ascii85', 'base85', 'base100', 'base122', ..., 'tomtom', 'dna', 'html', 'markdown', 'url', 'resistor', 'sms', 'whitespace', 'whitespace-after-before']

Playing with some base encodings.

```python
>>> codext.encode("this is a test", "base58-bitcoin")
'jo91waLQA1NNeBmZKUF'

>>> codext.encode("this is a test", "base58-ripple")
'jo9rA2LQwr44eBmZK7E'

>>> codext.encode("this is a test", "base58-url")
'JN91Wzkpa1nnDbLyjtf'

>>> codecs.encode("this is a test", "base100")
'👫👟👠👪🐗👠👪🐗👘🐗👫👜👪👫'

>>> codecs.decode("👫👟👠👪🐗👠👪🐗👘🐗👫👜👪👫", "base100")
'this is a test'

Playing with some cryptography-based codecs.

python
>>> codext.encode("This is a test !", "vigenere-MYSECRETKET")
'Ffaw kj e mowm !'

>>> codext.encode("This is a test !", "autoclave-SECRET")
'Llkj ml t amkb !'

Encoding/decoding with various other codecs.

python
>>> for i in range(8):
        print(codext.encode("this is a test", "dna-%d" % (i + 1)))
GTGAGCCAGCCGGTATACAAGCCGGTATACAAGCAGACAAGTGAGCGGGTATGTGA
CTCACGGACGGCCTATAGAACGGCCTATAGAACGACAGAACTCACGCCCTATCTCA
ACAGATTGATTAACGCGTGGATTAACGCGTGGATGAGTGGACAGATAAACGCACAG
AGACATTCATTAAGCGCTCCATTAAGCGCTCCATCACTCCAGACATAAAGCGAGAC
TCTGTAAGTAATTCGCGAGGTAATTCGCGAGGTAGTGAGGTCTGTATTTCGCTCTG
TGTCTAACTAATTGCGCACCTAATTGCGCACCTACTCACCTGTCTATTTGCGTGTC
GAGTGCCTGCCGGATATCTTGCCGGATATCTTGCTGTCTTGAGTGCGGGATAGAGT
CACTCGGTCGGCCATATGTTCGGCCATATGTTCGTCTGTTCACTCGCCCATACACT
>>> codext.decode("GTGAGCCAGCCGGTATACAAGCCGGTATACAAGCAGACAAGTGAGCGGGTATGTGA", "dna-1")
'this is a test'

>>> codecs.encode("this is a test", "morse")
'- .... .. ... / .. ... / .- / - . ... -'

>>> codecs.decode("- .... .. ... / .. ... / .- / - . ... -", "morse")
'this is a test'

>>> with open("morse.txt", 'w', encoding="morse") as f:
	f.write("this is a test")
14

>>> with open("morse.txt",encoding="morse") as f:
	f.read()
'this is a test'

>>> print(codext.encode("An example test string", "baudot-tape"))
***.**
   . *
***.* 
*  .  
   .* 
*  .* 
   . *
** .* 
***.**
** .**
   .* 
*  .  
* *. *
   .* 
* *.  
* *. *
*  .  
* *.  
* *. *
***.  
  *.* 
***.* 
 * .*

adler: Adler32 algorithm (relies on zlib)
crc: CRC of lengths 8, 10-17, 21, 24, 30-32, 40, 64, 82 with a variety of polynoms
luhn: Luhn mod N algorithm

Common

a1z26: keeps words whitespace-separated and uses a custom character separator
cases: set of case-related encodings (including camel-, kebab-, lower-, pascal-, upper-, snake- and swap-case, slugify, capitalize, title)
dummy: set of simple encodings (including integer, replace, reverse, word-reverse, substite and strip-spaces)
octal: dummy octal conversion (converts to 3-digits groups)
octal-spaced: variant of octal ; dummy octal conversion, handling whitespace separators
ordinal: dummy character ordinals conversion (converts to 3-digits groups)
ordinal-spaced: variant of ordinal ; dummy character ordinals conversion, handling whitespace separators

Compression

gzip: standard Gzip compression/decompression
lz77: compresses the given data with the algorithm of Lempel and Ziv of 1977
lz78: compresses the given data with the algorithm of Lempel and Ziv of 1978
pkzip_deflate: standard Zip-deflate compression/decompression
pkzip_bzip2: standard BZip2 compression/decompression
pkzip_lzma: standard LZMA compression/decompression

:warning: Compression functions are of course definitely NOT encoding functions ; they are implemented for leveraging the .encode(...) API from codecs.

Cryptography

:warning: Crypto functions are of course definitely NOT encoding functions ; they are implemented for leveraging the .encode(...) API from codecs.

Hashing

blake: includes BLAKE2b and BLAKE2s (Python 3 only ; relies on hashlib)
crypt: Unix's crypt hash for passwords (Python 3 and Unix only ; relies on crypt)
md: aka Message Digest ; includes MD4 and MD5 (relies on hashlib)
sha: aka Secure Hash Algorithms ; includes SHA1, 224, 256, 384, 512 (Python2/3) but also SHA3-224, -256, -384 and -512 (Python 3 only ; relies on hashlib)
shake: aka SHAKE hashing (Python 3 only ; relies on hashlib)

:warning: Hash functions are of course definitely NOT encoding functions ; they are implemented for convenience with the .encode(...) API from codecs and useful for chaning codecs.

Languages

Others

dna: implements the 8 rules of DNA sequences (N belongs to [1,8])
letter-indices: encodes consonants and/or vowels with their corresponding indices
markdown: unidirectional encoding from Markdown to HTML

Steganography

hexagram: uses Base64 and encodes the result to a charset of I Ching hexagrams (as implemented here)
klopf: aka Klopf code ; Polybius square with trivial alphabetical distribution
resistor: aka resistor color codes
rick: aka Rick cipher (in reference to Rick Astley's song "Never gonna give you up")
sms: also called T9 code ; uses "-" as a separator for encoding, "-" or "_" or whitespace for decoding
whitespace: replaces bits with whitespaces and tabs
whitespace_after_before: variant of whitespace ; encodes characters as new characters with whitespaces before and after according to an equation described in the codec name (e.g. "whitespace+2*after-3*before")