On Sat, Jul 20, 2019 at 2:27 AM Jens Axboe <axboe@xxxxxxxxx> wrote: > > On 7/18/19 6:54 PM, Zhengyuan Liu wrote: > > > > On 7/19/19 12:43 AM, Jens Axboe wrote: > >> On 7/18/19 9:41 AM, Jens Axboe wrote: > >>> On 7/18/19 6:44 AM, Zhengyuan Liu wrote: > >>>> There is a hang issue while using fio to do some basic test. The issue can > >>>> been easily reproduced using bellow scripts: > >>>> > >>>> while true > >>>> do > >>>> fio --ioengine=io_uring -rw=write -bs=4k -numjobs=1 \ > >>>> -size=1G -iodepth=64 -name=uring --filename=/dev/zero > >>>> done > >>>> > >>>> After serveral minutes, maybe more, fio would block at > >>>> io_uring_enter->io_cqring_wait in order to waiting for previously committed > >>>> sqes to be completed and cann't return to user anymore until we send a SIGTERM > >>>> to fio. After got SIGTERM, fio turns to hang at io_ring_ctx_wait_and_kill with > >>>> a backtrace like this: > >>>> > >>>> [54133.243816] Call Trace: > >>>> [54133.243842] __schedule+0x3a0/0x790 > >>>> [54133.243868] schedule+0x38/0xa0 > >>>> [54133.243880] schedule_timeout+0x218/0x3b0 > >>>> [54133.243891] ? sched_clock+0x9/0x10 > >>>> [54133.243903] ? wait_for_completion+0xa3/0x130 > >>>> [54133.243916] ? _raw_spin_unlock_irq+0x2c/0x40 > >>>> [54133.243930] ? trace_hardirqs_on+0x3f/0xe0 > >>>> [54133.243951] wait_for_completion+0xab/0x130 > >>>> [54133.243962] ? wake_up_q+0x70/0x70 > >>>> [54133.243984] io_ring_ctx_wait_and_kill+0xa0/0x1d0 > >>>> [54133.243998] io_uring_release+0x20/0x30 > >>>> [54133.244008] __fput+0xcf/0x270 > >>>> [54133.244029] ____fput+0xe/0x10 > >>>> [54133.244040] task_work_run+0x7f/0xa0 > >>>> [54133.244056] do_exit+0x305/0xc40 > >>>> [54133.244067] ? get_signal+0x13b/0xbd0 > >>>> [54133.244088] do_group_exit+0x50/0xd0 > >>>> [54133.244103] get_signal+0x18d/0xbd0 > >>>> [54133.244112] ? _raw_spin_unlock_irqrestore+0x36/0x60 > >>>> [54133.244142] do_signal+0x34/0x720 > >>>> [54133.244171] ? exit_to_usermode_loop+0x7e/0x130 > >>>> [54133.244190] exit_to_usermode_loop+0xc0/0x130 > >>>> [54133.244209] do_syscall_64+0x16b/0x1d0 > >>>> [54133.244221] entry_SYSCALL_64_after_hwframe+0x49/0xbe > >>>> > >>>> The reason is that we had added a req to ctx->pending_async at the very end, but > >>>> it got no chance to be processed anymore. How could this be happened? > >>>> > >>>> fio#cpu0 wq#cpu1 > >>>> > >>>> io_add_to_prev_work io_sq_wq_submit_work > >>>> > >>>> atomic_read() <<< 1 > >>>> > >>>> atomic_dec_return() << 1->0 > >>>> list_empty(); <<< true; > >>>> > >>>> list_add_tail() > >>>> atomic_read() << 0 or 1? > >>>> > >>>> As was said in atomic_ops.rst, atomic_read does not guarantee that the runtime > >>>> initialization by any other thread is visible yet, so we must take care of that > >>>> with a proper implicit or explicit memory barrier; > >>> Thanks for looking at this and finding this issue, it does looks like a problem. > >>> But I'm not sure about the fix. Shouldn't we just need an smp_mb__after_atomic() > >>> on the atomic_dec_return() side of things? Like the below. > >>> > >>> > >>> diff --git a/fs/io_uring.c b/fs/io_uring.c > >>> index 5ec06e5ba0be..3c2a6f88a6b0 100644 > >>> --- a/fs/io_uring.c > >>> +++ b/fs/io_uring.c > >>> @@ -1881,6 +1881,7 @@ static void io_sq_wq_submit_work(struct work_struct *work) > >>> */ > >>> if (async_list) { > >>> ret = atomic_dec_return(&async_list->cnt); > >>> + smp_mb__after_atomic(); > >>> while (!ret && !list_empty(&async_list->list)) { > >>> spin_lock(&async_list->lock); > >>> atomic_inc(&async_list->cnt); > >>> @@ -1894,6 +1895,7 @@ static void io_sq_wq_submit_work(struct work_struct *work) > >>> goto restart; > >>> } > >>> ret = atomic_dec_return(&async_list->cnt); > >>> + smp_mb__after_atomic(); > >>> } > >>> } > >>> > >>> > >> I don't think this is enough, I actually think your fix is the most > >> appropriate. I will apply it, thank you! > >> > > > > Hi, Jens. > > I have tested you fix and the issue still existed. Actually the > > implementation of atomic_dec_return has been implicitly surrounded > > already by mb() and as I know, smp_mb__after/before_atomic are not > > suitable for atomic_t operation which does not return a value. > > We aren't guaranteed to see the atomic_dec_return() update if it happens > at the same time. So we can either force ordering with the smp_mb(), or > we can do something ala: > > if (!atomic_sub_return(0, &list->cnt)) { > ... > > io_add_to_prev_work() to achieve the same sort of effect. That should > work as well. Yeah, but I'd prefer smp_mb(), since atomic_sub_return(0, &list->cnt) isn't such clear. Thanks. > > -- > Jens Axboe >