Re: [PATCH v2 1/1] io_uring: fix infinite wait in khread_park() on io_finish_async()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 5/16/19 2:53 AM, Roman Penyaev wrote:
> This fixes couple of races which lead to infinite wait of park completion
> with the following backtraces:
> 
>   [20801.303319] Call Trace:
>   [20801.303321]  ? __schedule+0x284/0x650
>   [20801.303323]  schedule+0x33/0xc0
>   [20801.303324]  schedule_timeout+0x1bc/0x210
>   [20801.303326]  ? schedule+0x3d/0xc0
>   [20801.303327]  ? schedule_timeout+0x1bc/0x210
>   [20801.303329]  ? preempt_count_add+0x79/0xb0
>   [20801.303330]  wait_for_completion+0xa5/0x120
>   [20801.303331]  ? wake_up_q+0x70/0x70
>   [20801.303333]  kthread_park+0x48/0x80
>   [20801.303335]  io_finish_async+0x2c/0x70
>   [20801.303336]  io_ring_ctx_wait_and_kill+0x95/0x180
>   [20801.303338]  io_uring_release+0x1c/0x20
>   [20801.303339]  __fput+0xad/0x210
>   [20801.303341]  task_work_run+0x8f/0xb0
>   [20801.303342]  exit_to_usermode_loop+0xa0/0xb0
>   [20801.303343]  do_syscall_64+0xe0/0x100
>   [20801.303349]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> 
>   [20801.303380] Call Trace:
>   [20801.303383]  ? __schedule+0x284/0x650
>   [20801.303384]  schedule+0x33/0xc0
>   [20801.303386]  io_sq_thread+0x38a/0x410
>   [20801.303388]  ? __switch_to_asm+0x40/0x70
>   [20801.303390]  ? wait_woken+0x80/0x80
>   [20801.303392]  ? _raw_spin_lock_irqsave+0x17/0x40
>   [20801.303394]  ? io_submit_sqes+0x120/0x120
>   [20801.303395]  kthread+0x112/0x130
>   [20801.303396]  ? kthread_create_on_node+0x60/0x60
>   [20801.303398]  ret_from_fork+0x35/0x40
> 
>  o kthread_park() waits for park completion, so io_sq_thread() loop
>    should check kthread_should_park() along with khread_should_stop(),
>    otherwise if kthread_park() is called before prepare_to_wait()
>    the following schedule() never returns:
> 
>    CPU#0                    CPU#1
> 
>    io_sq_thread_stop():     io_sq_thread():
> 
>                                while(!kthread_should_stop() && !ctx->sqo_stop) {
> 
>       ctx->sqo_stop = 1;
>       kthread_park()
> 
> 	                            prepare_to_wait();
>                                     if (kthread_should_stop() {
> 				    }
>                                     schedule();   <<< nobody checks park flag,
> 				                  <<< so schedule and never return
> 
>  o if the flag ctx->sqo_stop is observed by the io_sq_thread() loop
>    it is quite possible, that kthread_should_park() check and the
>    following kthread_parkme() is never called, because kthread_park()
>    has not been yet called, but few moments later is is called and
>    waits there for park completion, which never happens, because
>    kthread has already exited:
> 
>    CPU#0                    CPU#1
> 
>    io_sq_thread_stop():     io_sq_thread():
> 
>       ctx->sqo_stop = 1;
>                                while(!kthread_should_stop() && !ctx->sqo_stop) {
>                                    <<< observe sqo_stop and exit the loop
> 			       }
> 
> 			       if (kthread_should_park())
> 			           kthread_parkme();  <<< never called, since was
> 					              <<< never parked
> 
>       kthread_park()           <<< waits forever for park completion
> 
> In the current patch we quit the loop by only kthread_should_park()
> check (kthread_park() is synchronous, so kthread_should_stop() is
> never observed), and we abandon ->sqo_stop flag, since it is racy.
> At the end of the io_sq_thread() we unconditionally call parmke(),
> since we've exited the loop by the park flag.

Thanks Roman, looks (and tests) good. Applied.

-- 
Jens Axboe




[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux