Use snappy compression native when running Spark on Alpine

Thomas Decaux
1 min readJul 2, 2022

--

Snappy is a very good compression algo., used by Spark to write & read smaller parquet files. Spark use a java library “xerial”

JNI-based implementation to achieve comparable performance to the native C++ version. Snappy-java contains native libraries built for Window/Mac/Linux, etc. snappy-java loads one of these libraries according to your machine environment (It looks system properties, os.name and os.arch).

If no native library for your platform is found, snappy-java will fallback to pure-java implementation.

When using Alpine OS, very common with Docker and kubernetes, because Alpine dont have glic by default, snappy-java will fallback to pure java implementation, which can be broken.

Solution

1. Install native snappy

Install this package https://pkgs.alpinelinux.org/package/edge/community/x86_64/java-snappy-native

2. Tell to Spark executor to use the system library

According https://github.com/xerial/snappy-java/blob/master/src/main/java/org/xerial/snappy/SnappyLoader.java :

--

--