Re: Fedora 34 Change: Enable btrfs transparent zstd compression by default (System-Wide Change proposal)

Chris Murphy <lists@xxxxxxxxxxxxxxxxx> · Thu, 11 Feb 2021 22:05:49 -0700

On Thu, Feb 11, 2021 at 9:58 AM Jeremy Linton <jeremy.linton@xxxxxxx> wrote:
>
> Hi,
>
> On 1/1/21 8:59 PM, Chris Murphy wrote:

> > Anyway, compress=zstd:1 is a good default. Everyone benefits, and I'm
> > not even sure someone with a very fast NVMe drive will notice a slow
> > down because the compression/decompression is threaded.
>
> I disagree that everyone benefits. Any read latency sensitive workload
> will be slower due to the application latency being both the drive
> latency plus the decompression latency. And as the kernel benchmarks
> indicate very few systems are going to get anywhere near the performance
> of even baseline NVMe drives when its comes to throughput.

It's possible some workloads on NVMe might have faster reads or writes
without compression.

https://github.com/facebook/zstd

btrfs compress=zstd:1 translates into zstd -1 right now; there are
some ideas to remap btrfs zstd:1 to one of the newer zstd --fast
options, but it's just an idea. And in any case the default for btrfs
and zstd will remain as 3 and -3 respectively, which is what
'compress=zstd' maps to, making it identical to 'compress=zstd:3'.

I have a laptop with NVMe and haven't come across such a workload so
far, but this is obviously not a scientific sample. I think you'd need
a process that's producing read/write rates that the storage can meet,
but that the compression algorithm limits. Btrfs is threaded, as is
the compression.

What's typical, is no change in performance and sometimes a small
small increase in performance. It essentially trades some CPU cycles
in exchange for less IO. That includes less time reading and writing,
but also less latency, meaning the gain on rotational media is
greater.

>Worse, if the workload is very parallel, and at max CPU already
> the compression overhead will only make that situation worse as well. (I
> suspect you could test this just by building some packages that have
> good parallelism during the build).

This is compiling the kernel on a 4/8-core CPU (circa 2011) using make
-j8, the kernel running is 5.11-rc7.

no compression

real    55m32.769s
user    369m32.823s
sys     35m59.948s

------

compress=zstd:1

real    53m44.543s
user    368m17.614s
sys     36m2.505s

That's a one time test, and it's a ~3% improvement. *shrug* We don't
really care too much these days about 1-3% differences when doing
encryption, so I think this is probably in that ballpark, even if it
turns out another compile is 3% slower. This is not a significantly
read or write centric workload, it's mostly CPU. So this 3% difference
may not even be related to the compression.

> Plus, the write amplification comment isn't even universal as there
> continue to be controllers where the flash translation layer is
> compressing the data.

At least consumer SSDs tend to just do concurrent write dedup. File
system compression isn't limited to Btrfs, there's also F2FS
contributed by Samsung which implements compression these days as
well, although they commit to it at mkfs time, where as on Btrfs it's
a mount option. Mix and match compressed extents is routine on Btrfs
anyway, so there's no concern with users mixing things up. They can
change the compression level and even the algorithm with impunity,
just tacking it onto a remount command. It's not even necessary to
reboot.

> OTOH, it makes a lot more sense on a lot of these arm/sbc boards
> utilizing MMC because the disks are so slow. Maybe if something like
> this were made the default the machine should run a quick CPU
> compress/decompress vs IO speed test and only enable compression if the
> compress/decompress speed is at least the IO rate.

It's not that simple because neither the user space writers nor
kworkers are single threaded. You'd need a particularly fast NVMe
matched with a not so fast CPU with a workload that somehow dumps a
lot of data in a way that the compression acts as a bottle neck.

It could exist. But it's not a per se problem that I've seen. But if
you propose a test, I can do A/B testing.

--
Chris Murphy
_______________________________________________
devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx
Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure