Re: [GIT PULL] bcachefs

Jens Axboe <axboe@xxxxxxxxx> · Wed, 28 Jun 2023 19:33:18 -0600

On 6/28/23 7:00?PM, Dave Chinner wrote:
> On Wed, Jun 28, 2023 at 07:50:18PM -0400, Kent Overstreet wrote:
>> On Wed, Jun 28, 2023 at 05:14:09PM -0600, Jens Axboe wrote:
>>> On 6/28/23 4:55?PM, Kent Overstreet wrote:
>>>>> But it's not aio (or io_uring or whatever), it's simply the fact that
>>>>> doing an fput() from an exiting task (for example) will end up being
>>>>> done async. And hence waiting for task exits is NOT enough to ensure
>>>>> that all file references have been released.
>>>>>
>>>>> Since there are a variety of other reasons why a mount may be pinned and
>>>>> fail to umount, perhaps it's worth considering that changing this
>>>>> behavior won't buy us that much. Especially since it's been around for
>>>>> more than 10 years:
>>>>
>>>> Because it seems that before io_uring the race was quite a bit harder to
>>>> hit - I only started seeing it when things started switching over to
>>>> io_uring. generic/388 used to pass reliably for me (pre backpointers),
>>>> now it doesn't.
>>>
>>> I literally just pasted a script that hits it in one second with aio. So
>>> maybe generic/388 doesn't hit it as easily, but it's surely TRIVIAL to
>>> hit with aio. As demonstrated. The io_uring is not hard to bring into
>>> parity on that front, here's one I posted earlier today for 6.5:
>>>
>>> https://lore.kernel.org/io-uring/20230628170953.952923-4-axboe@xxxxxxxxx/
>>>
>>> Doesn't change the fact that you can easily hit this with io_uring or
>>> aio, and probably more things too (didn't look any further). Is it a
>>> realistic thing outside of funky tests? Probably not really, or at least
>>> if those guys hit it they'd probably have the work-around hack in place
>>> in their script already.
>>>
>>> But the fact is that it's been around for a decade. It's somehow a lot
>>> easier to hit with bcachefs than XFS, which may just be because the
>>> former has a bunch of workers and this may be deferring the delayed fput
>>> work more. Just hand waving.
>>
>> Not sure what you're arguing here...?
>>
>> We've had a long standing bug, it's recently become much easier to hit
>> (for multiple reasons); we seem to be in agreement on all that. All I'm
>> saying is that the existence of that bug previously is not reason to fix
>> it now.
> 
> I agree with Kent here  - the kernel bug needs to be fixed
> regardless of how long it has been around. Blaming the messenger
> (userspace, fstests, etc) and saying it should work around a
> spurious, unpredictable, undesirable and user-undebuggable kernel
> behaviour is not an acceptible solution here...

Not sure why you both are putting words in my mouth, I've merely been
arguing pros and cons and the impact of this. I even linked the io_uring
addition for ensuring that side will work better once the deferred fput
is sorted out. I didn't like the idea of fixing this through umount, and
even outlined how it could be fixed properly by ensuring we flush
per-task deferred puts on task exit.

Do I think it's a big issue? Not at all, because a) nobody has reported
it until now, and b) it's kind of a stupid case. If we can fix it with
minimal impact, should we? Yep. Particularly as the assumptions stated
in the original commit I referenced were not even valid back then.

-- 
Jens Axboe