Re: User experience issue on btrfs

Alexandre de Farias <alexandrebfarias@xxxxxxxxx> · Sun, 28 Jun 2020 22:45:24 -0300

On Sun, 2020-06-28 at 15:40 -0600, Chris Murphy wrote:
> On Sun, Jun 28, 2020 at 9:04 AM <alexandrebfarias@xxxxxxxxx> wrote:
> 
> > I'm willing to perform further testing. There shouldn't be anything
> > very special about my workload. I was working mostly with NodeJS 12
> > and React Native. VS Code (I should mention I make use of TabNine,
> > which can be a huge drain on system resources). So, in a typical
> > work session I'd have android emulator open, PostgreSQL, some
> > chrome tabs, VS Code, probably Emacs, plus the React Native metro
> > server and an Express.js backend.
> 
> Databases and VM images are things btrfs is bad at out of the box.
> Most of this has to do with fsync dependency of other file systems.
> Btrfs is equipped to deal with an fsync heavy world out of the box,
> using treelog enabled by default. But can still be slow for some
> workloads.

Can we do enough to make for a pleasant user experience? Are the btrfs
mitigations sufficient? Do we have good enough userspace tools to
actually take profit from BTRFS new features?

At this point, I'm fine with what I have and BTRFS usage would be
strictly for testing. Also, is there any reason as to why RHEL went
with XFS as a default and Fedora stayed with ext4? I mean, if it was a
conscious choice, the rationale then seems to be the exact opposite of
the rationale for making BTRFS the new default.

> (a) small for the workload and (b) not getting any hints about what's
> freed up for it to prepare for future writes. The SSD is trying to
> erase blocks right at the moment of allocation - super slow for any
> SSD to do that.

That's a strong possibility. I did increase the partition size and
things were better for a while, after defragmenting, etc. The small
initial size for the partition was what I believe many users will do
when trying a new operating system. I wasn't even sure I wanted to jump
ship and that was the space I could initially spare. Isn't that the
case for many users?

I mean, my Thinkpad X220 still has BTRFS on it on much worse SSD/Sata
speeds (WD Green) and I never got to experience performance issues. But
then, I just went ahead and added nodatacow to prevent it from going
south like the other notebook.

> Stick with what's working. Use XFS. This is also consistent with
> Facebook's workloads still on XFS. But if you really wanna give btrfs
> a shot at your workload. There are three possible optimizations:

Will the average user really benefit from BTRFS? I really like the new
stuff, but with so many rough edges, I find it hard to put it forward
for general use yet. I mean, of course Fedora is kind of bleeding edge,
but as of now, I believe it's hard to explain all the hoops you have to
go for a default option. 

> 1. Mount option space_cache=v2 (this will be the default soon),
> discard=async.

I actually tried space_cache=v2, never got to trying discard=async 

> 
> 
> 2. Mount option flushoncommit (you'll get benign, but annoying,
> WARNONS in dmesg)
>    And use fsync = off in postgresql.conf (really everywhere you can)
> 
> Note: if you get a crash you'll lose the last ~30s of commits, but
> the
> database and the file system are expected to be consistent. The
> commit
> interval is configurable, defaults to 30s. I suggest leaving it there
> for testing. It is mainly a risk vs performance assessment, as to why
> you'd change it.

I tried going with a 120s commit interval. Never used flushoncommit
either.

> 3. VM images have two schools of thought, depending on your risk
> tolerance.
> 
>         A. nodatacow:  (chattr +C). Use with cache=writeback.
> flushoncommit isn't necessary.
> 
>         B. datacow: Use with compression (mount option or chattr +c).
> Use with cache=unsafe. flushoncommit highly recommended.
> 

Yeah, nodatacow did make things better. But still significantly behind
XFS and probably even ext4. And I think the usability issues could be
more than a mild nuisance.

> 
> And yeah, how would anyone know all of this? And is it an opportunity
> for docs (probably) or desktop integration? Detect this workload or
> ask the user? I'm not sure.
> 

Do we know all of the workloads that could be disrupted by BTRFS? With
chattr +C or nodatacow is BTRFS any beter than LVM or other existing
solutions?

Have all options been considered? Is BTRFS really the FS of the future?
Has it been compared with other less-known but production proven
filesystems like NILFS? F2FS? BTRFS has enough caveats that its
popularity doesn't seem a compelling enough argument to just narrow
down the field to two contenders yet. Also, I wouldn't find it very
interesting to tell the end-users they can't use all of their HD's
space or else their PC might be slowed down to a crawl.

> [1] From your email, the kickstart shows
> > part btrfs.475 --fstype="btrfs" --ondisk=sda --size=93368
> 
> 93G is likely making things worse for your SSD. It's small for this
> workload. Chances are if it were bigger, it'd cope better by
> effectively being over provisioned, and it'd more easily get erase
> blocks ready. But discard=async will mitigate this.
> 
> 

Are there any tests out there which could realistically mimic those
scenarios? Not just the fresh out of the box scenario, but after a few
weeks usage. I get the feeling that if you don't have enough space,
even defragmenting ends up being uneffective. Does the BTRFS defragment
command produce the same results on a full disk as it would on an
overprovisioned system? I understand it'll obviously take longer with
less space, but there's no obvious reason to me as to why the end
results should be qualitatively too disparate. (not sure if this is
actually the case, just hypothesizing based on my anecdotal evidence).

Anyway, if a proper test flow can be established for those awkward, I
for one might be able to find the time to attempt provisioning a BTRFS
partition spanning the whole drive and putting those hypothetical
questions to test. At this point, with so many questions and so few
answers, I really think there needs to be some objective data to
justify this decision.

If it was a less significant system, perhaps it would still be possible
to go for this proposal without having many answers. But making a
decision about a filesystem, it would be wise to get a clearer picture
of the consequences of this choice. If it doesn't happen now, it will
have to happen later in a much worse way. Probably a significant amount
of people will come and complain that they can barely use their
computers and have no idea why.

Sticking with ext4 together forever just because it works would be
foolish. But that doesn't mean it's wise to just desperately commit to
BTRFS. Wouldn't it be better to do a much smaller change like going
with XFS as a default just like RHEL and try to encourage people to
test BTRFS in the meanwhile until there are better answers? Can those
answers be provided without real-world testing? I really don't know,
but all those questions come to my mind. I've tinkered with Linux since
I was 10 years old and surely broke my system in almost every way
possible, but this experience with BTRFS really stands out.
Subjectively, even ReiserFS when it was highly experimental didn't seem
to have as many issues as BTRFS has right now after (7?) years of being
marked as 'stable'.

--

Alexandre

<<attachment: smime.p7s>>
_______________________________________________
devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx