On 16/04/2021 14:58, Jens Axboe wrote: > On 4/16/21 7:12 AM, Pavel Begunkov wrote: >> On 16/04/2021 14:04, Jens Axboe wrote: >>> On 4/15/21 6:26 PM, Pavel Begunkov wrote: >>>> On 16/04/2021 01:22, Pavel Begunkov wrote: >>>>> Late catched 5.12 bug with nasty hangs. Thanks Jens for a reproducer. >>>> >>>> 1/2 is basically a rip off of one of old Jens' patches, but can't >>>> find it anywhere. If you still have it, especially if it was >>>> reviewed/etc., may make sense to go with it instead >>> >>> I wonder if we can do something like the below instead - we don't >>> care about a particularly stable count in terms of wakeup >>> reliance, and it'd save a nasty sync atomic switch. >> >> But we care about it being monotonous. There are nuances with it. > > Do we, though? We care about it changing when something has happened, > but not about it being monotonic. We may find inflight == get_inflight(), when it's not really so, and so get to schedule() awhile there are pending requests that are not going to be cancelled by itself. And those pending requests may have been non-discoverable and so non-cancellable, e.g. because were a part of a ling/hardlink. >> I think, non sync'ed summing may put it to eternal sleep. > > That's what the two reads are about, that's the same as before. The > numbers are racy in both cases, but that's why we compare after having > added ourselves to the wait queue. > >> Are you looking to save on switching? It's almost always is already >> dying with prior ref_kill > > Yep, always looking to avoid a sync switch if at all possible. For 99% > of the cases it's fine, it's the last case in busy prod that wreaks > havoc. Limited to sqpoll, so I wouldn't worry. Also considering that sqpoll doesn't have many file notes (as it was called before). We can completely avoid it and even make faster if happens from sq_thread() on it getting to exit, but do we want it for 5.12? -- Pavel Begunkov