On Tue, Feb 16, 2021 at 4:10 PM Jeremy Linton <jeremy.linton@xxxxxxx> wrote: > On 2/14/21 2:20 PM, Chris Murphy wrote: > > This isn't sufficiently qualified. It does work to reduce space > > consumption and write amplification. It's just that there's a tradeoff > > that you dislike, which is IO reduction. And it's completely > > reasonable to have a subjective position on this tradeoff. But no > > matter what there is a consequence to the choice. > > IO reduction in some cases (see below), for additional read latency, and > and increase in CPU utilization. > > For a desktop workload the former is likely a larger problem. But as we > all know sluggishness is a hard thing to measure on a desktop. QD1 > pointer chasing on disk though is a good approximation, sometimes boot > times are too. What is your counter proposal? > > A larger file might have a mix of compressed and non-compressed > > extents, based on this "is it worth it" estimate. This is the > > difference between the compress and compress-force options, where > > force drops this estimator and depends on the compression algorithm to > > do that work. I sometimes call that estimator the "early bailout" > > check. > > Compression estimation is its own ugly ball of wax. But ignoring that > for the moment, consider what happens if you have a bunch of 2G database > files with a reasonable compression ratio. Lets assume for a moment the > database attempts to update records in the middle of the files. What > happens when the compression ratio gets slightly worse? (its likely you > already have nodatacow). What percentage of Fedora desktop users do you estimate have a bunch of 2G database files? I don't assume datacow or nodatacow for databases, because some databases and their workloads do OK on COW filesystems and others don't. Also, nodatacow disables compression. i.e. files having file attribute 'C' (nodatacow) with mount option compress(-force) remain uncompressed. > Although this becomes a case of > seeing if the "compression estimation" logic is smart enough to detect > its causing poor IO patterns (while still having a reasonably good > compression ratio). The "early bail" heuristic just tries to estimate if the effort of compression is worth it. If it is, the data extent is submitted for compression and if it's not worth it, it isn't. The max extent size for this is 128KiB. There's no IO pattern detection. Once the compression has happened, the write allocator works the same as without compression. https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/btrfs/compression.c?h=v5.11#n1314 https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/btrfs/compression.c?h=v5.11#n1609 > In a past life, I spent a non inconsequential part of a decade > engineering compressed ram+storage systems (similar to what has been > getting merged to mainline over the past few years). Its really hard to > make one that is performant across a wide range of workloads. What you > get are areas where it can help, but if you average those case with the > ones where it hurts the overwhelming analysis is you shouldn't be > compressing unless you want the capacity. The worse part is that most > synthetic file IO benchmarks tend to be on the "it helps" side of the > equation and the real applications on the other. This is why I tend to poo poo on benchmarks. They're useful for the narrow purpose they're intended to measure. Synthetic benchmarks are good at exposing problems, but won't tell you their significance, so what they expose is the need for better testing. A databased benchmark will do a good job showing performance issues with workloads that act like the database that the benchmark is mimicking. Not all databases have the same behavior. > IMHO if fedora wanted to take a hit on the IO perf side, a much better > place to focus would be flipping encryption on. The perf profile is > flatter (aes-ni & the arm crypto extensions are common) with fewer evil > edge cases. Or a more controlled method might to be picking a couple > fairly atomic directories and enabling compression there (say /usr). Workstation WG has been tracking these: https://pagure.io/fedora-workstation/issue/136 https://pagure.io/fedora-workstation/issue/82 A significant impediment to ticket the "Encrypt my data" checkbox by default in Automatic partitioning is the UI/UX. The current evaluation centers on using systemd-homed to encrypt user data by default; and optionally enabling system encryption with the key sealed in the TPM, or protected on something like a yubikey. There's still some work to do to get this integrated. -- Chris Murphy _______________________________________________ devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure