Re: [PATCH 0/2] fix hangs with shared sqpoll

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 16/04/2021 15:09, Pavel Begunkov wrote:
> On 16/04/2021 14:58, Jens Axboe wrote:
>> On 4/16/21 7:12 AM, Pavel Begunkov wrote:
>>> On 16/04/2021 14:04, Jens Axboe wrote:
>>>> On 4/15/21 6:26 PM, Pavel Begunkov wrote:
>>>>> On 16/04/2021 01:22, Pavel Begunkov wrote:
>>>>>> Late catched 5.12 bug with nasty hangs. Thanks Jens for a reproducer.
>>>>>
>>>>> 1/2 is basically a rip off of one of old Jens' patches, but can't
>>>>> find it anywhere. If you still have it, especially if it was
>>>>> reviewed/etc., may make sense to go with it instead
>>>>
>>>> I wonder if we can do something like the below instead - we don't
>>>> care about a particularly stable count in terms of wakeup
>>>> reliance, and it'd save a nasty sync atomic switch.
>>>
>>> But we care about it being monotonous. There are nuances with it.
>>
>> Do we, though? We care about it changing when something has happened,
>> but not about it being monotonic.
> 
> We may find inflight == get_inflight(), when it's not really so,
> and so get to schedule() awhile there are pending requests that
> are not going to be cancelled by itself. And those pending requests
> may have been non-discoverable and so non-cancellable, e.g. because
> were a part of a ling/hardlink.

Anyway, there might be other problems because of how wake_up()'s
and ctx->refs putting is ordered. Needs to be remade, probably
without ctx->refs in the first place.

>>> I think, non sync'ed summing may put it to eternal sleep.
>>
>> That's what the two reads are about, that's the same as before. The
>> numbers are racy in both cases, but that's why we compare after having
>> added ourselves to the wait queue.
>>
>>> Are you looking to save on switching? It's almost always is already
>>> dying with prior ref_kill
>>
>> Yep, always looking to avoid a sync switch if at all possible. For 99%
>> of the cases it's fine, it's the last case in busy prod that wreaks
>> havoc.
> 
> Limited to sqpoll, so I wouldn't worry. Also considering that sqpoll
> doesn't have many file notes (as it was called before). We can
> completely avoid it and even make faster if happens from sq_thread()
> on it getting to exit, but do we want it for 5.12?
> 

-- 
Pavel Begunkov



[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux