Re: Transparent compression with ext4 - especially with zstd

Dave Chinner <david@xxxxxxxxxxxxx> · Wed, 22 Jan 2025 08:26:21 +1100

On Tue, Jan 21, 2025 at 07:47:24PM +0100, Gerhard Wiesinger wrote:
> On 21.01.2025 05:01, Theodore Ts'o wrote:
> > On Sun, Jan 19, 2025 at 03:37:27PM +0100, Gerhard Wiesinger wrote:
> > > Are there any plans to include transparent compression with ext4 (especially
> > > with zstd)?
> > I'm not aware of anyone in the ext4 deveopment commuity working on
> > something like this.  Fully transparent compression is challenging,
> > since supporting random writes into a compressed file is tricky.
> > There are solutions (for example, the Stac patent which resulted in
> > Microsoft to pay $120 million dollars), but even ignoring the
> > intellectual property issues, they tend to compromise the efficiency
> > of the compression.
> > 
> > More to the point, given how cheap byte storage tends to be (dollars
> > per IOPS tend to be far more of a constraint than dollars per GB),
> > it's unclear what the business case would be for any company to fund
> > development work in this area, when the cost of a slightly large HDD
> > or SSD is going to be far cheaper than the necessary software
> > engineering investrment needed, even for a hyperscaler cloud company
> > (and even there, it's unclear that transparent compression is really
> > needed).
> > 
> > What is the business and/or technical problem which you are trying to
> > solve?
> > 
> Regarding necessity:
> We are talking in some scenarios about some factors of diskspace. E.g. in my
> database scenario with PostgreSQL around 85% of disk space can be saved
> (e.g. around factor 7).

So use a database that has built-in data compression capabilities.

e.g. Mysql has transparent table compression functionality.
This requires sparse files and FALLOC_FL_PUNCH_HOLE support in the
filesystem, but there is no need for any special filesystem side
support for data compression to get space gains of up to 75% on
compressible data sets with the default database (16kB record size)
and filesystem configs (4kB block size).

The argument that "application level compression is hard, so we want
the filesystem to do it for us" ignores the fact that it is -much
harder- to do efficient compression in the filesystem than at the
application level.

The OS and filesystem doesn't have the freedom to control
application level data access patterns nor tailor the compression
algorithms to match how the application manages data, so everything
the filesystem implements is a compromise. It will never be optimal
for any given workload, because we have to make sure that it is
not complete garbage for any given workload...

> In cloud usage scenarios you can easily reduce that amount of allocated
> diskspace by around a factor 7 and reduce cost therefore.

Same argument: cloud applications should be managing their data
sets appropriately and efficiently, not relying on the cloud storage
infrastructure to magically do stuff to "reduce costs" for them.

Remeber: there's a massive conflict of interest on the vendor side
here - the less efficient the application (be it CPU, RAM or storage
capacity), the more money the cloud vendor makes from users running
that application. Hence they have little motivation to provide
infrastructure or application functionality that costs them money to
implement and has the impact of reducing their overall revenue
stream...

> You might also get a performance boost by using caching mechanism more
> efficient (e.g. using less RAM).

Not true. Linux caches uncompressed data in the page cache - caching
compressed data will significantly increase the memory footprint and
CPU consumption as it has to be constantly uncompressed and
recompressed as the data changes. This is not a viable caching
strategy for a general purpose OS.

> Also with precompressed files (e.g. photo, videos) you can safe around 5-10%

Video and photos do not compress sufficiently to be a viable runtime
compression target for filesystem based compression. It's a massive
waste of resources to attempt compression of internally compressed
data formats for anything but cold data storage. And even then, if
it's cold storage then the data should be compressed and checksummed
by the cold storage application before it is written to the
filesystem.

> The technical topic is that IMHO no stable and practical usable Linux
> filesystem which is included in the default kernel exists.
> - ZFS works but is not included in the default kernel
> - BTRFS has stability and repair issues (see mailing lists) and bugs with
> compression (does not compress on the fly in some scenarios)

I hear this sort of generic "btrfs is not stable/has bugs" complaint
as a reason for not using btrfs all the time.

I hear just as many, if not more, generic "XFS is unstable and loses
data" claims as a reason for not using XFS, too.

Anecdotal claims are not proof of fact, and I don't see any real
evidence that btrfs is unstable.  e.g. Fedora has been using btrfs
as the root filesystem (and has for quite a while now) and there has
been no noticable increase in bug reports (either for fs
functionality or data loss) compared to when ext4 or XFS was used as
the default filesystem type...

IOWs, I redirect generic "btrfs is unstable" complaints to /dev/null
these days, just like I do with generic "XFS is unstable"
complaints.

-Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx