Am Freitag, den 22.05.2020, 07:48 +0900 schrieb Tetsuo Handa: Hi, I looked at your patch again and I am impressed and I need to apologize. I looked at only the intended use, but overlooked the unintended use. I think we need to work on the description of the patch though. > On 2020/05/22 4:50, Oliver Neukum wrote: > > interesting. Do you have a test case for these patches working? > > No. Yes, going through the logs it looks that you need to trigger an error case for this race to strike. > > > wait_event(desc->wait, > > > /* > > > * needs both flags. We cannot do with one > > > * because resetting it would cause a race > > > * with write() yet we need to signal > > > * a disconnect > > > */ > > > !test_bit(WDM_IN_USE, &desc->flags) || > > > test_bit(WDM_DISCONNECTING, &desc->flags)); > > > > > > but wdm_write() is not calling wake_up(&desc->wait) after > > > clear_bit(WDM_IN_USE, &desc->flags) when usb_submit_urb() failed. > > > > Yes, because desc->wlock is held. There can be nobody sleeping here. > > Then, this patch is not needed. (But adding some comment is welcomed.) OK and here I screwed up. wlock is held in wdm_write(), but not in wdm_flush() So may I suggest the following log: -- WDM_IN_USE is used in wdm_write() to protect against concurrent writes and in wdm_flush() to wait for all messages to flush, so that errors are not lost. The former use is guarded by a mutex, the later, as it does no IO, is not. In the error case of wdm_write(), however, there is a race, which can make wdm_flush() wait for IO that was supposed to be started but was never started due to an error condition. Hence if an error is detected in wdm_write() after WDM_IN_USE was toggeled, all potential waiters must be woken. As multiple tasks can be in wdm_flush, wake_up_all() must be used in all cases WDM_IN_USE is reset. -- May I ask you to redo the patch with comments added stating that the wake up is done for the sake of wdm_flush(), change the description and add the link to syzkaller? > > > > And is this a bugfix? For what? Does it need to go to stable kernels? Yes, it is. The bug is ancient. It goes back to afba937e540c9, which introduced the driver. Again, thank you for this impressive piece of debugging. Regards Oliver