On Thu, Feb 11, 2021 at 9:58 AM Jeremy Linton <jeremy.linton@xxxxxxx> wrote: > > Hi, > > On 1/1/21 8:59 PM, Chris Murphy wrote: > > Anyway, compress=zstd:1 is a good default. Everyone benefits, and I'm > > not even sure someone with a very fast NVMe drive will notice a slow > > down because the compression/decompression is threaded. > > I disagree that everyone benefits. Any read latency sensitive workload > will be slower due to the application latency being both the drive > latency plus the decompression latency. And as the kernel benchmarks > indicate very few systems are going to get anywhere near the performance > of even baseline NVMe drives when its comes to throughput. It's possible some workloads on NVMe might have faster reads or writes without compression. https://github.com/facebook/zstd btrfs compress=zstd:1 translates into zstd -1 right now; there are some ideas to remap btrfs zstd:1 to one of the newer zstd --fast options, but it's just an idea. And in any case the default for btrfs and zstd will remain as 3 and -3 respectively, which is what 'compress=zstd' maps to, making it identical to 'compress=zstd:3'. I have a laptop with NVMe and haven't come across such a workload so far, but this is obviously not a scientific sample. I think you'd need a process that's producing read/write rates that the storage can meet, but that the compression algorithm limits. Btrfs is threaded, as is the compression. What's typical, is no change in performance and sometimes a small small increase in performance. It essentially trades some CPU cycles in exchange for less IO. That includes less time reading and writing, but also less latency, meaning the gain on rotational media is greater. >Worse, if the workload is very parallel, and at max CPU already > the compression overhead will only make that situation worse as well. (I > suspect you could test this just by building some packages that have > good parallelism during the build). This is compiling the kernel on a 4/8-core CPU (circa 2011) using make -j8, the kernel running is 5.11-rc7. no compression real 55m32.769s user 369m32.823s sys 35m59.948s ------ compress=zstd:1 real 53m44.543s user 368m17.614s sys 36m2.505s That's a one time test, and it's a ~3% improvement. *shrug* We don't really care too much these days about 1-3% differences when doing encryption, so I think this is probably in that ballpark, even if it turns out another compile is 3% slower. This is not a significantly read or write centric workload, it's mostly CPU. So this 3% difference may not even be related to the compression. > Plus, the write amplification comment isn't even universal as there > continue to be controllers where the flash translation layer is > compressing the data. At least consumer SSDs tend to just do concurrent write dedup. File system compression isn't limited to Btrfs, there's also F2FS contributed by Samsung which implements compression these days as well, although they commit to it at mkfs time, where as on Btrfs it's a mount option. Mix and match compressed extents is routine on Btrfs anyway, so there's no concern with users mixing things up. They can change the compression level and even the algorithm with impunity, just tacking it onto a remount command. It's not even necessary to reboot. > OTOH, it makes a lot more sense on a lot of these arm/sbc boards > utilizing MMC because the disks are so slow. Maybe if something like > this were made the default the machine should run a quick CPU > compress/decompress vs IO speed test and only enable compression if the > compress/decompress speed is at least the IO rate. It's not that simple because neither the user space writers nor kworkers are single threaded. You'd need a particularly fast NVMe matched with a not so fast CPU with a workload that somehow dumps a lot of data in a way that the compression acts as a bottle neck. It could exist. But it's not a per se problem that I've seen. But if you propose a test, I can do A/B testing. -- Chris Murphy _______________________________________________ devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure