GitPedia

Lzav

Fast In-Memory Data Compression Algorithm (header-only C/C++) 480+MB/s compress, 2800+MB/s decompress, ratio% better than LZ4, Snappy, and Zstd@-1

From avaneev·Updated June 15, 2026·View on GitHub·

LZAV is a fast general-purpose in-memory data compression algorithm based on now-classic [LZ77](https://wikipedia.org/wiki/LZ77_and_LZ78) lossless data compression method. LZAV holds a good position on the Pareto landscape of factors, among many similar in-memory (non-streaming) compression algorithms. The project is written primarily in C++, distributed under the MIT License license, first published in 2023. Key topics include: compress, compression, compression-algorithm, compression-library, compressor.

LZAV - Fast Data Compression Algorithm (in C/C++)

Introduction

LZAV is a fast general-purpose in-memory data compression algorithm based on
now-classic LZ77 lossless data
compression method. LZAV holds a good position on the Pareto landscape of
factors, among many similar in-memory (non-streaming) compression algorithms.

The LZAV algorithm's code is portable, cross-platform, scalar, header-only,
and inlinable C (compatible with C++). It supports big- and little-endian
platforms, and any memory alignment models. The algorithm is efficient on both
32- and 64-bit platforms. Incompressible data almost does not expand.
Compliant with WebAssembly (WASI libc), and runs there at just twice lower
performance than the native code.

LZAV does not sacrifice internal out-of-bounds (OOB) checks for decompression
speed. This means that LZAV can be used in strict conditions where OOB memory
writes (and especially reads) that lead to a trap are unacceptable (e.g.,
real-time, system, server software). LZAV can be used safely (causing no
crashing nor UB) even when decompressing malformed or damaged compressed data,
which means that LZAV does not require calculation of a checksum (or hash) of
the compressed data. Only a checksum of the uncompressed data may be required,
depending on an application's guarantees.

The internal functions available in the lzav.h file allow you to easily
implement, and experiment with, your own compression algorithms. LZAV stream
format and decompressor have a potential of high decompression speeds and
compression ratios, which depend on the way data is compressed.

Usage

To compress data:

c
#include "lzav.h" int max_len = lzav_compress_bound( src_len ); void* comp_buf = malloc( max_len ); int comp_len = lzav_compress_default( src_buf, comp_buf, src_len, max_len ); if( comp_len == 0 && src_len != 0 ) { // Error handling. }

To decompress data:

c
#include "lzav.h" void* decomp_buf = malloc( src_len ); int l = lzav_decompress( comp_buf, decomp_buf, comp_len, src_len ); if( l < 0 ) { // Error handling. }

To compress data with a higher ratio, for non-time-critical uses (e.g.,
compression of application's static assets):

c
#include "lzav.h" int max_len = lzav_compress_bound_hi( src_len ); // Note another bound function! void* comp_buf = malloc( max_len ); int comp_len = lzav_compress_hi( src_buf, comp_buf, src_len, max_len ); if( comp_len == 0 && src_len != 0 ) { // Error handling. }

LZAV algorithm and the source code (which conforms to
ISO C99) were quality-tested on:
Clang, GCC, MSVC, and Intel C++ compilers; on x86, x86-64 (Intel, AMD),
and AArch64 (Apple Silicon) architectures; Windows 10, AlmaLinux 9.3, and
macOS 15.7. Full C++ compliance is enabled conditionally and automatically
when the source code is compiled with a C++ compiler.

Ports

Customizing C++ namespace

In C++ environments where it is undesirable to export LZAV symbols into the
global namespace, the LZAV_NS_CUSTOM macro can be defined externally:

c
#define LZAV_NS_CUSTOM lzav #include "lzav.h"

Similarly, LZAV symbols can be placed into any other custom namespace (e.g.,
a namespace with data compression functions):

c
#define LZAV_NS_CUSTOM my_namespace #include "lzav.h"

This way, LZAV symbols and functions can be referenced like
my_namespace::lzav_compress_default(...). Note that since all LZAV functions
have the static specifier, there can be no ABI conflicts, even if the LZAV
header is included in unrelated, mixed C/C++, compilation units.

Comparisons

The tables below present performance ballpark numbers of LZAV algorithm
(based on Silesia dataset).

While LZ4 seems to compress faster, LZAV comparably provides 15.5% memory
storage cost savings. This is a significant benefit in database and file
system use cases since compression is only about 35% slower while CPUs rarely
run at their maximum capacity anyway (considering cached data writes are
deferred in background threads), and disk I/O times are reduced due to a
better compression. In general, LZAV holds a very strong position in this
class of data compression algorithms, if one considers all factors:
compression and decompression speeds, compression ratio, and just as
important - code maintainability: LZAV is maximally portable and has a rather
small independent codebase.

Performance of LZAV is not limited to the presented ballpark numbers.
Depending on the data being compressed, LZAV can achieve 800 MB/s compression
and 5000 MB/s decompression speeds. Incompressible data decompresses at 10000
MB/s rate, which is not far from the "memcpy". There are cases like the
enwik9 dataset where LZAV
provides 22% higher memory storage savings compared to LZ4.

The geomean performance of the LZAV algorithm on a variety of datasets is
550 +/- 150 MB/s compression and 3800 +/- 1300 MB/s decompression speeds,
on 4+ GHz 64-bit processors released since 2019. Note that the algorithm
exhibits adaptive qualities, and its actual performance depends on the data
being compressed. LZAV may show an exceptional performance on your specific
data, including, but not limited to: sparse databases, log files, HTML/XML
files.

It is also worth noting that compression methods like LZAV and LZ4 usually
have an advantage over dictionary- and entropy-based coding in that
hash-table-based compression has a small memory and operational overhead while
the classic LZ77 decompression has no overhead at all - this is especially
relevant for smaller data.

For a more comprehensive in-memory compression algorithms benchmark you may
visit lzbench.

Apple clang 15.0.0 arm64, macOS 15.7, Apple M1, 3.5 GHz

Silesia compression corpus

CompressorCompressionDecompressionRatio %
LZAV 5.8625 MB/s3790 MB/s39.94
LZ4 1.9.4700 MB/s4570 MB/s47.60
Snappy 1.1.10495 MB/s3230 MB/s48.22
LZF 3.6395 MB/s800 MB/s48.15
LZAV 5.8 HI134 MB/s3700 MB/s35.12
LZ4HC 1.9.4 -940 MB/s4360 MB/s36.75

LLVM clang 19.1.7 x86-64, AlmaLinux 9.3, Xeon E-2386G (RocketLake), 5.1 GHz

Silesia compression corpus

CompressorCompressionDecompressionRatio %
LZAV 5.8620 MB/s3490 MB/s39.94
LZ4 1.9.4848 MB/s4980 MB/s47.60
Snappy 1.1.10690 MB/s3360 MB/s48.22
LZF 3.6455 MB/s1000 MB/s48.15
LZAV 5.8 HI115 MB/s3330 MB/s35.12
LZ4HC 1.9.4 -943 MB/s4920 MB/s36.75

LLVM clang-cl 18.1.8 x86-64, Windows 10, Ryzen 3700X (Zen2), 4.2 GHz

Silesia compression corpus

CompressorCompressionDecompressionRatio %
LZAV 5.8525 MB/s3060 MB/s39.94
LZ4 1.9.4675 MB/s4560 MB/s47.60
Snappy 1.1.10415 MB/s2440 MB/s48.22
LZF 3.6310 MB/s700 MB/s48.15
LZAV 5.8 HI116 MB/s2970 MB/s35.12
LZ4HC 1.9.4 -936 MB/s4430 MB/s36.75

P.S. Popular Zstd's benchmark was not included here, because it is not a pure
LZ77, much harder to integrate, and has a much larger code size - a different
league, close to zlib. Here are author's Zstd measurements with
TurboBench, on Ryzen 3700X,
on Silesia dataset:

CompressorCompressionDecompressionRatio %
zstd 1.5.5 -1460 MB/s1870 MB/s41.0
zstd 1.5.5 1436 MB/s1400 MB/s34.6

Datasets Benchmark

This section presents compression ratio comparisons for various popular
datasets. Note that each file within these datasets was compressed
individually, which contributed to the overall ratio.

DatasetSize, MiBLZAV 5.8LZ4 1.9.4Snappy 1.1.10LZF 3.6Source
4SICS 151020 PCAP24.520.4721.8224.9525.34www.netresec.com
4SICS 151022 PCAP200.036.4537.3540.2441.37www.netresec.com
Calgary Large3.144.2951.9751.7649.07data-compression.info
Canterbury2.6838.0743.7345.4242.49corpus.canterbury.ac.nz
Canterbury Large10.638.2551.9748.3754.28corpus.canterbury.ac.nz
Canterbury Artificial0.2933.3633.7436.4834.66corpus.canterbury.ac.nz
employees_10KB.json0.0122.5524.6823.9223.52sample.json-format.com
employees_100KB.json0.1015.9617.7119.0221.88sample.json-format.com
employees_50MB.json51.510.7816.4218.5721.44sample.json-format.com
enwik895.444.6157.2656.5653.95www.mattmahoney.net
enwik9954.739.3950.9250.7949.30www.mattmahoney.net
Manzini855.326.9837.3038.5739.04people.unipmn.it/manzini
chr22.dna (Manzini)33.038.7952.8244.5355.86people.unipmn.it/manzini
w3c2 HTML (Manzini)99.411.4322.2025.3527.20people.unipmn.it/manzini
Silesia202.139.9447.6048.1748.15github.com/MiloszKrajewski

Notes

  1. LZAV API is not equivalent to LZ4 or Snappy API. For example, the dstlen
    parameter in the decompressor should specify the original uncompressed length,
    which should have been previously stored in some way, independent of LZAV.

  2. From a technical point of view, peak decompression speeds of LZAV have an
    implicit limitation arising from its more complex stream format, compared to
    LZ4: LZAV decompression requires more code branching. Another limiting factor
    is a rather large dynamic 2-512 MiB LZ77 window which is not CPU
    cache-friendly. On the other hand, without these features it would not be
    possible to achieve competitive compression ratios while having fast
    compression speeds.

  3. LZAV supports compression of continuous data blocks of up to 2 GB. Larger
    data should be compressed in chunks of at least 16 MB. Using smaller chunks
    may reduce the achieved compression ratio.

Contributors

Showing top 2 contributors by commit count.

View all contributors on GitHub →

This article is auto-generated from avaneev/lzav via the GitHub API.Last fetched: 6/18/2026