Releasing Czlib and Zstd Go Bindings | Datadog

Releasing czlib and zstd Go Bindings

Author Jason Moiron

Published: July 11, 2016

To commemorate the third annual GopherCon US in Denver this week, we’re releasing cgo bindings to two compression libraries that we’ve been using in production at Datadog for a while now: czlib and zstd.

czlib started as a fork of the vitess project’s cgzip package. Our primary data pipeline uses zlib compressed messages, but the standard library’s pure Go implementation can be significantly slower than the C zlib library. In order to address this gap, we modified a few flags in cgzip to make it encode and decode with zlib wrapping rather than with gzip headers.

We’ve detailed some of the other more novel design decisions in czlib, including its batch interfaces, in our general blog on performance in Go a couple of years ago. Performance varies quite a bit among the various interfaces, so it pays to benchmark using a message that is typical for your system by running the czlib benchmark suite with PAYLOAD=path_to_message go test -run=NONE -bench .

Here are modern benchmark results running go1.7beta2 for compression and decompression using the non-streaming interface in czlib, the streaming interface, and the standard library’s compress/zlib that show the variance in performance:

# using a 2kb plaintext message
BenchmarkCompress-4                30000         47415 ns/op      44.42 MB/s
BenchmarkCompressStream-4          20000         61732 ns/op      34.11 MB/s
BenchmarkCompressStdZlib-4          5000        227182 ns/op       9.27 MB/s
BenchmarkDecompress-4             200000          8238 ns/op     255.62 MB/s
BenchmarkDecompressStream-4       100000         18352 ns/op     114.75 MB/s
BenchmarkDecompressStdZlib-4       50000         31565 ns/op      66.72 MB/s

# using a 1.7MB plaintext message
BenchmarkCompress-4                   20      69808144 ns/op      24.70 MB/s
BenchmarkCompressStream-4             20      73170819 ns/op      23.56 MB/s
BenchmarkCompressStdZlib-4            20      70498763 ns/op      24.46 MB/s
BenchmarkDecompress-4                200       6709252 ns/op     256.98 MB/s
BenchmarkDecompressStream-4          200       6891833 ns/op     250.18 MB/s
BenchmarkDecompressStdZlib-4         100      14256445 ns/op     120.94 MB/s

zstd, pronounced Zstandard, is a relatively new fast compression library from Yann Collet, the author of lz4. It has recently finalized its format, and a 1.0 release is pending. It compresses slightly faster than zlib at level 6 at a slightly better ratio, and decompresses much faster, making it a great general purpose zlib replacement.

The zstd library supports some interfaces that are common in more advanced compression libraries like stream compression, compression levels and pre-computed dictionaries. These are all exposed by our zstd Go binding, with the dictionary builder available in the upstream repos. The binding intentionally mimics the zlib interface, and aside from a few functions that do not return error in zstd, it is functionally a drop-in replacement. It also exposes a fixed-length batch compression interface present in the underlying library, very similar to the lz4 interface.