Re: Fedora 33 System-Wide Change proposal: Make btrfs the default file system for desktop variants

Florian Weimer <fweimer@xxxxxxxxxx> · Tue, 30 Jun 2020 11:58:29 +0200

* Steven Whitehouse:

> On 27/06/2020 11:00, Florian Weimer wrote:
>> * Josef Bacik:
>>
>>> As for your ENOSPC issue, I've made improvements on that area.  I
>>> see this in production as well, I have monitoring in place to deal
>>> with the machine before it gets to this point.  That being said if
>>> you run the box out of metadata space things get tricky to fix.
>>> I've been working my way down the list of issues in this area for
>>> years, this last go around of patches I sent were in these corner
>>> cases.
>> Is there anything we need to do in userspace to improve the behavior
>> of fflush and similar interfaces?
>>
>> This is not strictly a btrfs issue: Some of us are worried about
>> scenarios where the write system call succeeds and the data never
>> makes it to storage *without a catastrophic failure*.  (I do not
>> consider running out of disk space a catastrophic failure.)  NFS
>> apparently has this property, and you have to call fsync or close the
>> descriptor to detect this.  fsync is not desirable due to its
>> performance impact.
>
> It doesn't matter which filesystem you use, you can't be sure that the
> data is really safe on disk without calling fsync. In the case of a
> new inode, that means fsync on the file and on the containing
> directory.

In my opinion, there is a conceptual difference between the machine or
storage crashing hard, and just running out of disk space.

> There can be performance issues depending on how that is done, however
> there are a number of solutions to those issues which can reduce the
> performance effects to the point where they are usually no longer a
> problem. That is with the caveat that slow storage will always be
> slow, of course!
>
> The usual tricks are to avoid doing lots of small fsyncs, by gathering
> up smaller files, ideally sorting them into inode number order for
> local filesystems, and then issuing fsyncs asynchronously, waiting for
> them all only once all the fsyncs have been issued. Also
> fadvise/madvise can be useful in these situations too,

None of this applies to shell utilities such as grep and cat.  They work
around data loss as a result of the write system call not reporting
ENOSPC errors: they close stdout and stderr underneath glibc, which
leads to a different class of problems.  It turns out that on Linux,
close does more space checks than write, so this allows the shell
utilities to check for ENOSPC without issuing fsyncs.  At present, lack
of space checks from write seems to primarily happen with NFS.

So let me rephrase: Does btrfs report ENOSPC during write?  If it does
not, what can we do to check for sufficient space during fflush and
similar operations?

If we change the shell utilities to do an fsync on close, we get
traditional UNIX behavior with traditional UNIX performance.  I don't
think that's what people want.

Thanks,
Florian
_______________________________________________
devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx