On 16/04/2021 15:09, Pavel Begunkov wrote: > On 16/04/2021 14:58, Jens Axboe wrote: >> On 4/16/21 7:12 AM, Pavel Begunkov wrote: >>> On 16/04/2021 14:04, Jens Axboe wrote: >>>> On 4/15/21 6:26 PM, Pavel Begunkov wrote: >>>>> On 16/04/2021 01:22, Pavel Begunkov wrote: >>>>>> Late catched 5.12 bug with nasty hangs. Thanks Jens for a reproducer. >>>>> >>>>> 1/2 is basically a rip off of one of old Jens' patches, but can't >>>>> find it anywhere. If you still have it, especially if it was >>>>> reviewed/etc., may make sense to go with it instead >>>> >>>> I wonder if we can do something like the below instead - we don't >>>> care about a particularly stable count in terms of wakeup >>>> reliance, and it'd save a nasty sync atomic switch. >>> >>> But we care about it being monotonous. There are nuances with it. >> >> Do we, though? We care about it changing when something has happened, >> but not about it being monotonic. > > We may find inflight == get_inflight(), when it's not really so, > and so get to schedule() awhile there are pending requests that > are not going to be cancelled by itself. And those pending requests > may have been non-discoverable and so non-cancellable, e.g. because > were a part of a ling/hardlink. Anyway, there might be other problems because of how wake_up()'s and ctx->refs putting is ordered. Needs to be remade, probably without ctx->refs in the first place. >>> I think, non sync'ed summing may put it to eternal sleep. >> >> That's what the two reads are about, that's the same as before. The >> numbers are racy in both cases, but that's why we compare after having >> added ourselves to the wait queue. >> >>> Are you looking to save on switching? It's almost always is already >>> dying with prior ref_kill >> >> Yep, always looking to avoid a sync switch if at all possible. For 99% >> of the cases it's fine, it's the last case in busy prod that wreaks >> havoc. > > Limited to sqpoll, so I wouldn't worry. Also considering that sqpoll > doesn't have many file notes (as it was called before). We can > completely avoid it and even make faster if happens from sq_thread() > on it getting to exit, but do we want it for 5.12? > -- Pavel Begunkov