Re: uring regression - lost write request

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 11/11/21 9:55 AM, Jens Axboe wrote:
> On 11/11/21 9:19 AM, Jens Axboe wrote:
>> On 11/11/21 8:29 AM, Jens Axboe wrote:
>>> On 11/11/21 7:58 AM, Jens Axboe wrote:
>>>> On 11/11/21 7:30 AM, Jens Axboe wrote:
>>>>> On 11/10/21 11:52 PM, Daniel Black wrote:
>>>>>>> Would it be possible to turn this into a full reproducer script?
>>>>>>> Something that someone that knows nothing about mysqld/mariadb can just
>>>>>>> run and have it reproduce. If I install the 10.6 packages from above,
>>>>>>> then it doesn't seem to use io_uring or be linked against liburing.
>>>>>>
>>>>>> Sorry Jens.
>>>>>>
>>>>>> Hope containers are ok.
>>>>>
>>>>> Don't think I have a way to run that, don't even know what podman is
>>>>> and nor does my distro. I'll google a bit and see if I can get this
>>>>> running.
>>>>>
>>>>> I'm fine building from source and running from there, as long as I
>>>>> know what to do. Would that make it any easier? It definitely would
>>>>> for me :-)
>>>>
>>>> The podman approach seemed to work, and I was able to run all three
>>>> steps. Didn't see any hangs. I'm going to try again dropping down
>>>> the innodb pool size (box only has 32G of RAM).
>>>>
>>>> The storage can do a lot more than 5k IOPS, I'm going to try ramping
>>>> that up.
>>>>
>>>> Does your reproducer box have multiple NUMA nodes, or is it a single
>>>> socket/nod box?
>>>
>>> Doesn't seem to reproduce for me on current -git. What file system are
>>> you using?
>>
>> I seem to be able to hit it with ext4, guessing it has more cases that
>> punt to buffered IO. As I initially suspected, I think this is a race
>> with buffered file write hashing. I have a debug patch that just turns
>> a regular non-numa box into multi nodes, may or may not be needed be
>> needed to hit this, but I definitely can now. Looks like this:
>>
>> Node7 DUMP                                                                      
>> index=0, nr_w=1, max=128, r=0, f=1, h=0                                         
>>   w=ffff8f5e8b8470c0, hashed=1/0, flags=2                                       
>>   w=ffff8f5e95a9b8c0, hashed=1/0, flags=2                                       
>> index=1, nr_w=0, max=127877, r=0, f=0, h=0                                      
>> free_list                                                                       
>>   worker=ffff8f5eaf2e0540                                                       
>> all_list                                                                        
>>   worker=ffff8f5eaf2e0540
>>
>> where we seed node7 in this case having two work items pending, but the
>> worker state is stalled on hash.
>>
>> The hash logic was rewritten as part of the io-wq worker threads being
>> changed for 5.11 iirc, which is why that was my initial suspicion here.
>>
>> I'll take a look at this and make a test patch. Looks like you are able
>> to test self-built kernels, is that correct?
> 
> Can you try with this patch? It's against -git, but it will apply to
> 5.15 as well.

I think that one covered one potential gap, but I just managed to
reproduce a stall even with it. So hang on testing that one, I'll send
you something more complete when I have confidence in it.

-- 
Jens Axboe




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux