Use snappy compression native when running Spark on Alpine

Thomas Decaux
1 min readJul 2, 2022

Snappy is a very good compression algo., used by Spark to write & read smaller parquet files. Spark use a java library “xerial”

JNI-based implementation to achieve comparable performance to the native C++ version. Snappy-java contains native libraries built for Window/Mac/Linux, etc. snappy-java loads one of these libraries according to your machine environment (It looks system properties, and os.arch).

If no native library for your platform is found, snappy-java will fallback to pure-java implementation.

When using Alpine OS, very common with Docker and kubernetes, because Alpine dont have glic by default, snappy-java will fallback to pure java implementation, which can be broken.


1. Install native snappy

Install this package

2. Tell to Spark executor to use the system library

According :