Re: [RFC PATCH] io_uring: add a memory barrier before atomic_read

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 7/20/19 9:14 AM, Zhengyuan Liu wrote:
> On Sat, Jul 20, 2019 at 2:27 AM Jens Axboe <axboe@xxxxxxxxx> wrote:
>>
>> On 7/18/19 6:54 PM, Zhengyuan Liu wrote:
>>>
>>> On 7/19/19 12:43 AM, Jens Axboe wrote:
>>>> On 7/18/19 9:41 AM, Jens Axboe wrote:
>>>>> On 7/18/19 6:44 AM, Zhengyuan Liu wrote:
>>>>>> There is a hang issue while using fio to do some basic test. The issue can
>>>>>> been easily reproduced using bellow scripts:
>>>>>>
>>>>>>              while true
>>>>>>              do
>>>>>>                      fio  --ioengine=io_uring  -rw=write -bs=4k -numjobs=1 \
>>>>>>                           -size=1G -iodepth=64 -name=uring   --filename=/dev/zero
>>>>>>              done
>>>>>>
>>>>>> After serveral minutes, maybe more, fio would block at
>>>>>> io_uring_enter->io_cqring_wait in order to waiting for previously committed
>>>>>> sqes to be completed and cann't return to user anymore until we send a SIGTERM
>>>>>> to fio. After got SIGTERM, fio turns to hang at io_ring_ctx_wait_and_kill with
>>>>>> a backtrace like this:
>>>>>>
>>>>>>              [54133.243816] Call Trace:
>>>>>>              [54133.243842]  __schedule+0x3a0/0x790
>>>>>>              [54133.243868]  schedule+0x38/0xa0
>>>>>>              [54133.243880]  schedule_timeout+0x218/0x3b0
>>>>>>              [54133.243891]  ? sched_clock+0x9/0x10
>>>>>>              [54133.243903]  ? wait_for_completion+0xa3/0x130
>>>>>>              [54133.243916]  ? _raw_spin_unlock_irq+0x2c/0x40
>>>>>>              [54133.243930]  ? trace_hardirqs_on+0x3f/0xe0
>>>>>>              [54133.243951]  wait_for_completion+0xab/0x130
>>>>>>              [54133.243962]  ? wake_up_q+0x70/0x70
>>>>>>              [54133.243984]  io_ring_ctx_wait_and_kill+0xa0/0x1d0
>>>>>>              [54133.243998]  io_uring_release+0x20/0x30
>>>>>>              [54133.244008]  __fput+0xcf/0x270
>>>>>>              [54133.244029]  ____fput+0xe/0x10
>>>>>>              [54133.244040]  task_work_run+0x7f/0xa0
>>>>>>              [54133.244056]  do_exit+0x305/0xc40
>>>>>>              [54133.244067]  ? get_signal+0x13b/0xbd0
>>>>>>              [54133.244088]  do_group_exit+0x50/0xd0
>>>>>>              [54133.244103]  get_signal+0x18d/0xbd0
>>>>>>              [54133.244112]  ? _raw_spin_unlock_irqrestore+0x36/0x60
>>>>>>              [54133.244142]  do_signal+0x34/0x720
>>>>>>              [54133.244171]  ? exit_to_usermode_loop+0x7e/0x130
>>>>>>              [54133.244190]  exit_to_usermode_loop+0xc0/0x130
>>>>>>              [54133.244209]  do_syscall_64+0x16b/0x1d0
>>>>>>              [54133.244221]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
>>>>>>
>>>>>> The reason is that we had added a req to ctx->pending_async at the very end, but
>>>>>> it got no chance to be processed anymore. How could this be happened?
>>>>>>
>>>>>>              fio#cpu0                                        wq#cpu1
>>>>>>
>>>>>>              io_add_to_prev_work                    io_sq_wq_submit_work
>>>>>>
>>>>>>                atomic_read() <<< 1
>>>>>>
>>>>>>                                                        atomic_dec_return() << 1->0
>>>>>>                                                        list_empty();    <<< true;
>>>>>>
>>>>>>                list_add_tail()
>>>>>>                atomic_read() << 0 or 1?
>>>>>>
>>>>>> As was said in atomic_ops.rst, atomic_read does not guarantee that the runtime
>>>>>> initialization by any other thread is visible yet, so we must take care of that
>>>>>> with a proper implicit or explicit memory barrier;
>>>>> Thanks for looking at this and finding this issue, it does looks like a problem.
>>>>> But I'm not sure about the fix. Shouldn't we just need an smp_mb__after_atomic()
>>>>> on the atomic_dec_return() side of things? Like the below.
>>>>>
>>>>>
>>>>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>>>>> index 5ec06e5ba0be..3c2a6f88a6b0 100644
>>>>> --- a/fs/io_uring.c
>>>>> +++ b/fs/io_uring.c
>>>>> @@ -1881,6 +1881,7 @@ static void io_sq_wq_submit_work(struct work_struct *work)
>>>>>               */
>>>>>              if (async_list) {
>>>>>                      ret = atomic_dec_return(&async_list->cnt);
>>>>> +           smp_mb__after_atomic();
>>>>>                      while (!ret && !list_empty(&async_list->list)) {
>>>>>                              spin_lock(&async_list->lock);
>>>>>                              atomic_inc(&async_list->cnt);
>>>>> @@ -1894,6 +1895,7 @@ static void io_sq_wq_submit_work(struct work_struct *work)
>>>>>                                      goto restart;
>>>>>                              }
>>>>>                              ret = atomic_dec_return(&async_list->cnt);
>>>>> +                   smp_mb__after_atomic();
>>>>>                      }
>>>>>              }
>>>>>
>>>>>
>>>> I don't think this is enough, I actually think your fix is the most
>>>> appropriate. I will apply it, thank you!
>>>>
>>>
>>> Hi, Jens.
>>> I have tested you fix and the issue still existed. Actually the
>>> implementation of atomic_dec_return has been implicitly surrounded
>>> already by mb()  and as I know, smp_mb__after/before_atomic are not
>>> suitable for atomic_t operation which does not return a value.
>>
>> We aren't guaranteed to see the atomic_dec_return() update if it happens
>> at the same time. So we can either force ordering with the smp_mb(), or
>> we can do something ala:
>>
>>          if (!atomic_sub_return(0, &list->cnt)) {
>>                  ...
>>
>> io_add_to_prev_work() to achieve the same sort of effect. That should
>> work as well.
> 
> Yeah,  but I'd prefer smp_mb(), since atomic_sub_return(0, &list->cnt) isn't
> such clear.

In some ways I actually think it's more clear, as it makes it explicit
what we're synchronizing with, and it's then up to the atomic primitives
to use the right barrier. But I'm fine with the smp_mb() and that's what
I already queued up, so let's just stick with that.

-- 
Jens Axboe




[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux