On Fri, Jul 10, 2020 at 1:45 PM Tomasz Torcz <tomek@xxxxxxxxxxxxxx> wrote: > > On Fri, Jul 10, 2020 at 07:14:09PM +0200, Vitaly Zaitsev via devel wrote: > > On 26.06.2020 16:42, Ben Cotton wrote: > > > ** transparent compression: significantly reduces write amplification, > > > improves lifespan of storage hardware > > > > What can you say about this? https://arxiv.org/pdf/1707.08514.pdf > > Also funny note: when compression was introduced in ZFS, circa 2007, > it was mainly promoted as _performance_ win, not a space saving measure. > This was still 5 years before NVMe, so all we had was SATA, SAS and FC > drives, yet the CPUs were already multi-core and multi-gigahertz. > Transfering uncompressed data was _slower_ than compressing/decompressing > and having to transfer less data. For a bit higher CPU usage we got > noticeable bandwidth wins. > The tradeoff is no longer there, as single drives reach 7GiB/s > transfer speed. It would need to be benchmarked. The CPU in these cases has also improved dramatically, perhaps more significantly than storage performance. In which case, the compression may still not be a limiting factor. lzbench is useful for this. Compiling it on Fedora is straight forward but needs this hint or some improved understanding of the problem https://github.com/inikep/lzbench/issues/69 Note, you should use -b 128K since the Btrfs compress block size is 128KiB. There are a variety of corpuses available, I use silesia.tar http://sun.aei.polsl.pl/~sdeor/index.php?page=silesia But you can also just tar /usr or /home. There is error introduced with this benchmark. Btrfs compression is per file. Any files less than 128K tend to have lower compression ratio, so there is an overestimate of compression by lzbench in this regard; whereas there's btrfs inline extents possible and in that regard the compression is underestimated (or more correctly the actual cost of the write). Another error is single thread vs multiple thread compression, and single queue vs multi queue block device. Another error is lzbench has essentially no latency, it's just one file being tested. Whereas real world usage there's many files being read and written, each with latency, during which time compression can happen for essentially no additional latency cost. But not always for no cost. So it's actually really complicated and probably why no one really wants to do this kind of detailed benchmarking analysis. We're probably better off making a new benchmark based on ordinary things: compiling the kernel, launching applications, doing updates, git updating and git log searching, etc. But even that is just a guess. That reminds me: a git based approach for aging a file system. https://www.usenix.org/system/files/hotstorage19-paper-conway.pdf https://github.com/saurabhkadekodi/geriatrix I haven't messed around with that, but maybe someone wants to turn that into a how to. I'll do the testing if no one wants to burn their SSD with writes. I've got a Samsung 840 EVO on an old laptop that I'm actively trying to kill off. Something that isn't accountable without blind studies involving users, is some latencies users are hyper sensitive to and other latencies they aren't at all sensitive to. I haven't dug up any research on this, but I imagine it has been. Apple did a bunch of UI changes early in the Mac OS X development cycle and while overall latencies were lower as a result of having an (almost) preemptive multitasking OS instead of the former cooperative multitasking OS, the GUI had so much "eye candy" special effects that users got pissed at how slow the OS seemed. -- Chris Murphy _______________________________________________ devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx