MySQL, MariaDB and PostgreSQL do their own, schema and page-size aware compression. Why not let the databases do this? They are in a better position to do it and trade off the costs where and when it matters to them. -- Oleg Kiselev On 1/21/25, 11:35, "Theodore Ts'o" <tytso@xxxxxxx <mailto:tytso@xxxxxxx>> wrote: CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. On Tue, Jan 21, 2025 at 07:47:24PM +0100, Gerhard Wiesinger wrote: > We are talking in some scenarios about some factors of diskspace. E.g. in > my database scenario with PostgreSQL around 85% of disk space can be saved > (e.g. around factor 7). So the problem with using compression with databases is that they need to be able to do random writes into the middle of a file. So that means you need to use tricks such as writing into clusters, typically 32k or 64k. What this means is that a single 4k random write gets amplified into a 32k or 64k write. > In cloud usage scenarios you can easily reduce that amount of allocated > diskspace by around a factor 7 and reduce cost therefore. If you are running this on a cloud platform, where you are limited (on GCE) or charged (on AWS) by IOPS and throughput, this can be a performance bottleneck (or cost you extra). At the minimum the extra I/O throughput will very likely show up on various performance benchmarks. Worse, using a transparent compression breaks the ACID properties of the database. If you crash or have a power failure while rewriting the 64k compression cluster, all or part of that 64k compression cluster can be corrupted. And if your customers care about (their) data integrity, the fact that you cheaped out on disk space might not be something that would impress them terribly. The short version is that transparent compression is not free, even if you ignore the SWE development costs of implementing such a feature, and then getting that feature to be fit for use in an enterprise use case. No matter what file system you might want to use, I *strongly* suggest that you get a power fail rack and try putting the whole stack on said power fail rack, and try dropping power while running a stress test --- over, and over, and over again. What you might find would surprise you. > The technical topic is that IMHO no stable and practical usable Linux > filesystem which is included in the default kernel exists. > - ZFS works but is not included in the default kernel > - BTRFS has stability and repair issues (see mailing lists) and bugs with > compression (does not compress on the fly in some scenarios) > - bcachefs is experimental When I started work at Google 15 years ago to deploy ext4 into production, we did precisely this, and as well as deploying to a small percentage of Google's test fleet to do A:B comparisons before we deployed to the entire production fleet. Whether or not it is "practical" and "usable" depends on your definition, I guess, but from my perspective "stable" and "not losing users' data" is job #1. But hey, if it's worth so much to you, I suggest you cost out what it would cost to actually implement the features that you so much want, or how much it would cost to make the more complex file systems to be stable for production use. You might decide that paying the extra storage costs is way cheaper than software engineering investment costs involved. At Google, and when I was at IBM before that, we were always super disciplined about trying to figure out the ROI costs of some particular project and not just doing it because it was "cool". There's a famous story about how the engineers working on ZFS didn't ask for management's permission or input from the sales team before they started. Sounds great, and there was some cool technology there in ZFS --- but note that Sun had to put the company up for sale because they were losing money... Cheers, - Ted P.S. Note: using a compression cluster is the only real way to support transparent compression if you are using an update-in-place file system like ext4 or xfs. (And that is what was coverd by the Stac patents that I mentioned.) If you are using a log-structed file system, such as ZFS, then you can simply rewrite the compression cluster *and* update the file system metadata to point at the new compression cluster --- but then the garbage collection costs, and the file system metadata update costs for each database commit are *huge*, and the I/O throughput hit is even higher. So much so that ZFS recommends that you turn off the log-structured write and do update-in-place if you want to use a database on ZFS. But I'm pretty sure that this disables transparent compression if you are using update-in-place. TNSTAAFL.