Re: Fedora 33 System-Wide Change proposal: Make btrfs the default file system for desktop variants

Konstantin Kharlamov <hi-angel@xxxxxxxxx> · Sun, 28 Jun 2020 01:29:58 +0300

On Sat, 2020-06-27 at 12:42 -0600, Chris Murphy wrote:
> On Sat, Jun 27, 2020 at 8:01 AM Konstantin Kharlamov <hi-angel@xxxxxxxxx>
> wrote:
> > I see no one mentined yet: BTRFS is slow on HDDs. It trivially comes from
> > BTRFS
> > being COW. So if you changed a bit in a file, BTRFS will copy a block (or
> > maybe
> > a number of them, not sure this detail matters) to another place, and now
> > your
> > data got fragmented. SSDs may not care, HDDs on the other hand do.
>
> It's faster on some workloads, slower on others. There are
> optimizations to help make up for COW: inline extents for small files,
> and random writes that commit together (i.e. in the same 30s window)
> will be written as sequential writes. It is true btrfs does not have
> nearly as many locality optimizations as ext4 and xfs, but at least
> xfs developers have recently proposed removing those HDD optimizations
> in favor of optimizations that are more relevant to today's hardware
> and workloads.
>
> > Another reason worth mentioning: BTRFS per se is slow. If you look at
> > benchmarks
> > on Phoronix comparing BTRFS with others, BTRFS is rarely even on par with
> > them.
>
> It wins some. It loses others.

This sounds very wrong. This deludes readers into thinking BTRFS is on par with
other FSes. If you head over to the Phoronix article you linked below and try to
count how many times BTRFS was winning/on par/lost, you'll see the ratio is not
even close on the BTRFS side.

To save you the effort, it is:

type      | win | on par | lose
NVMe:     | 3   | 4      | 8
SATA SSD: | 0   | 5      | 10
USB SSD:  | 0   | 1      | 4

FYI, in this calculation I took the BTRFS side a few times, and counted it as
either "winning" or "on par". It was when it had a head against only part of
other FSes. (Idk why "USB SSD" has many tests missing)

>  Head over to the xfs list and enjoy the
> benchmark commentary from people who actually understand benchmarking.
> A recurring theme is that a benchmark is only as relevant to the
> degree it actually mimics the workload you care about. And most
> benchmark tools don't do that very well.
>
> Here's a benchmark that's apples to apples because I'm merely timing
> the time to compile the exact same thing each time, twice.
> 
https://docs.google.com/spreadsheets/d/1b-y2WVrQK4ijo1TS5aRe0QROSf8CU3ckTiPQ_8evGR0/edit#gid=0

What point are you trying to make here? If you're implying that "applications
startup time" that the article measured is more "syntetic test" than kernel
compilation time you measuring, then this sounds odd. Because people start apps
up more often than compile the kernel. In fact, compiation process includes
starting up apps.

> They're all in the same ballpark, except there's a write time hit for
> the one with zstd:1 on this particular setup (and the compression hit
> isn't consistent across all hardware or setups, it's case by case -
> and hence the proposal option for compression indicates applying it
> selectively to locations we know there's a benefit across the board).
> But also you can tell there's no read time (decompression) hit from
> this same data set.

It is nice to see, although I'm pretty surprised they all have the same
performance, except the one with compression. Could it be because all files got
cached in RAM? If you did test by doing `git clone` and then running the build,
then I'm pretty sure it did. I don't know how it works when files are cached,
but I wouldn't be surprised if a number of filesystem-specific paths would be
skipped in this case.

> Meanwhile, this is somewhere between embarrassing and comedy:
> https://www.phoronix.com/scan.php?page=article&item=linux-50-filesystems&num=4
>
> Hmmm, 21 seconds to launch GNOME Terminal with an NVMe and you aren't
> curious about what went wrong? Because obviously something is wrong.
> The measurement is wrong or the method is wrong or something in the
> setup is fouling things up. How do you get a fast result with SSD but
> then such a slow result with NVMe?
>
> It makes no sense, but meh, we'll just publish that shit anyway! LOLZ!
> And that is how you light your credibility on fire, because you just
> don't give a crap about it.

You misread it, the NVMe startup time is 1.03sec. The 21.01sec. time is SATA
3.0 SSD. No need to swear.

Not to say it is not odd compared to other results, but we can only guess.

> On my 9 year old laptop with a mere Samsung 840 EVO, barely under  1
> second for GNOME Terminal to launch, following a reboot and login so
> this is not the result of caching. On my much newer HP Spectre with
> NVMe, under 0.5s to launch.
>
> My methodology and metrology? I'm using the "one mississippi" method
> from finger click of the actual app icon to the time I see a cursor in
> the launched app.  Not rocket science.

Good for you. But you're trying take take decision for all other peoples, so you
need to take into account not everyone has NVMe or SSD. HDDs that many people
are also using are much slower. This means your "1 second vs 0.5 second" can
easily turn into "5 seconds vs 10 seconds" (and not necessarily linearly).

> > As a matter of fact, I have two Archlinux laptops on BTRFS with compression,
> > both only have HDD. I've been using for 3-4 years BTRFS there I think, maybe
> > more. I made use of BTRFS because I was hoping that using ZSTD would result
> > in
> > less IO. Well, now my overall experience is that it is not rare that systems
> > starting to lag terribly, then I execute `grep "" /proc/pressure/*`, and see
> > someone is hogging IO. Then I pop up `iotop -a` and see among various
> > processes
> > a `[btrfs-cleaner]` and `[btrfs-transacti]`. It may be because of defrag
> > option,
> > I'm not sure…
>
> There are many btrfs threads. Those actually make it more performant.
> If you look at their total cpu time though, e.g. ps aux, you'll see
> it's really small compared to most anything else you might think is
> idle.
>
> root         366  0.0  0.0      0     0 ?        S    Jun25   1:22
> [btrfs-transacti]
> root         500  0.0  0.0      0     0 ?        S    Jun25   1:45
> [irq/135-iwlwifi]
> dbus         538  0.0  0.0 271548  6968 ?        S    Jun25   1:13
> dbus-broker --log 4 --controller 9 --machine-id
> ce3f1eade82d42bd891a8c15714b13cf --max-bytes 536870912 --m
> root        1328  0.0  0.1 1273476 10116 ?       Sl   Jun25   3:00
> /opt/teamviewer/tv_bin/teamviewerd -d
>
> There is in fact a WTF moment as a result of this partial listing and
> it's not btrfs.
>
> BTW this is 2 days of uptime.

You misread me, I wasn't talking about CPU time, I was talking about IO.

_______________________________________________
devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx