Re: [bug report] WARNING: CPU: 5 PID: 679 at io_uring/io_uring.c:2835 io_ring_exit_work+0x2b6/0x2e0

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 4/16/24 6:14 AM, Pavel Begunkov wrote:
> On 4/16/24 12:38, Jens Axboe wrote:
>> On 4/16/24 4:00 AM, Ming Lei wrote:
>>> On Tue, Apr 16, 2024 at 10:26:16AM +0800, Changhui Zhong wrote:
>>>>>>
>>>>>> I can't reproduce this here, fwiw. Ming, something you've seen?
>>>>>
>>>>> I just test against the latest for-next/block(-rc4 based), and still can't
>>>>> reproduce it. There was such RH internal report before, and maybe not
>>>>> ublk related.
>>>>>
>>>>> Changhui, if the issue can be reproduced in your machine, care to share
>>>>> your machine for me to investigate a bit?
>>>>>
>>>>> Thanks,
>>>>> Ming
>>>>>
>>>>
>>>> I still can reproduce this issue on my machine?
>>>> and I shared machine to Ming?he can do more investigation for this issue?
>>>>
>>>> [ 1244.207092] running generic/006
>>>> [ 1246.456896] blk_print_req_error: 77 callbacks suppressed
>>>> [ 1246.456907] I/O error, dev ublkb1, sector 2395864 op 0x1:(WRITE)
>>>> flags 0x8800 phys_seg 1 prio class 0
>>>
>>> The failure is actually triggered in recovering qcow2 target in generic/005,
>>> since ublkb0 isn't removed successfully in generic/005.
>>>
>>> git-bisect shows that the 1st bad commit is cca6571381a0 ("io_uring/rw:
>>> cleanup retry path").
>>>
>>> And not see any issue in uring command side, so the trouble seems
>>> in normal io_uring rw side over XFS file, and not related with block
>>> device.
>>
>> Indeed, I can reproduce it on XFS as well. I'll take a look.
> 
> Looking at this patch, that io_rw_should_reissue() path is for when
> we failed via the kiocb callback but came there off of the submission
> path, so when we unwind back it finds the flag, preps and resubmits
> the req. If it's not the case but we still return "true", it'd leaks
> the request, which would explains why exit_work gets stuck.

Yep, this is what happens. I have a test patch that just punts any
reissue to task_work, it'll insert to iowq from there. Before we would
fail it, even though we didn't have to, but that check was killed and
then it just lingers for this case and it's lost.

-- 
Jens Axboe





[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux