Re: [GIT PULL] bcachefs

Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> · Thu, 10 Aug 2023 22:53:27 -0700

On Thu, 10 Aug 2023 at 22:29, Kent Overstreet <kent.overstreet@xxxxxxxxx> wrote:
>
> On Thu, Aug 10, 2023 at 10:20:22PM -0700, Linus Torvalds wrote:
> > If it's purely "umount doesnt' succeed because the filesystem is still
> > busy with cleanups", then things are much better.
>
> That's exactly it. We have various tests that kill -9 fio and then
> umount, and umount spuriously fails.

Well, it sounds like Jens already has some handle on at least one
io_uring shutdown case that didn't wait for completion.

At the same time, a random -EBUSY is kind of an expected failure in
real life, since outside of strictly controlled environments you could
easily have just some entirely unrelated thing that just happens to
have looked at the filesystem when you tried to unmount it.

So any real-life use tends to use umount in a (limited) loop. It might
just make sense for the fsstress test scripts to do the same
regardless.

There's no actual good reason to think that -EBUSY is a hard error. It
very much can be transient.

In fact, I have this horrible flash-back memory to some auto-expiry
scripts that used to do the equivalent of "umount -a -t autofs" every
minute or so as a horrible model for expiring things, happy and secure
in the knowledge that if the filesystem was still in active use, it
would just fail.

So may I suggest that even if the immediate issue ends up being sorted
out, just from a robustness standpoint the "consider EBUSY a hard
error" seems to be a mistake.

Transient failures are pretty much expected, and not all of them are
necessarily kernel-related (ie think "concurrent updatedb run" or any
number of other possibilities).

          Linus