On 2021/02/11 5:15, Shuah Khan wrote: > On 2/10/21 11:43 AM, Tetsuo Handa wrote: >> On 2021/02/11 3:20, Shuah Khan wrote: >>> On 2/10/21 11:16 AM, Tetsuo Handa wrote: >>>> On 2021/02/11 3:11, Shuah Khan wrote: >>>>> I would like to see to see a complete fix. This patch changes >>>>> kthread_get_run() to return NULL. Without adding handling for >>>>> NULL in the callers of kthread_get_run(), we will start seeing >>>>> problems. >>>> >>>> What problems are you aware of? >>>> >>> >>> The fact that driver doesn't cleanup after failing to create >>> the thread is a problem. >> >> What are the cleanup functions? >> > > When user-space requests attaching to a device, attach_store() > tries to attach the requested device. When kthread_get_run() > failure is ignored silently, and continue with call to > rh_port_connect(), user-space assumes attach is successful. > User thinks attach is successful. The struct kthread_create_info *create = kmalloc(sizeof(*create), GFP_KERNEL); in __kthread_create_on_node() never fails unless killed by the OOM killer due to the "too small to fail" memory-allocation rule, and wait_for_completion_killable(&done) in __kthread_create_on_node() never fails unless killed. Creating a kernel thread as root user unlikely fails, and memory allocations by that kernel thread also never fails due to the "too small to fail" memory-allocation rule. Therefore, kthread_get_run() effectively fails only when current thread which called attach_store() was killed. And > > When and how will this attach failure gets reported to the > in this scenario? if the current thread was killed, how can the failure get reported to the user-space in this scenario? > > Error handling for this case is no different from other error > paths in attach_store(). > > Please see error handling for other errors in attach_store(). Being "killed" means that user-space can never know the result unlike other error paths in attach_store(). > In this case the right error handling is to rewind the vdev > init and bail out returning error. This would include setting > vdev->ud.status to VDEV_ST_NULL. If the user-space was killed, the kernel is responsible for offering automatic cleanup which includes setting vdev->ud.status to VDEV_ST_NULL. > > I found the following reproducer that tells me how attach > is triggered. > https://syzkaller.appspot.com/text?tag=ReproC&x=128506e4d00000 This reproducer (which is killed after 5 seconds from fork()) uses only /sys/devices/platform/vhci_hcd.0/attach interface and never uses detach interface. Detach and cleanup are up to automatic cleanup offered by the kernel. > > syzbot is helping us harden these paths, which is awesome. > Fixing these have to consider user api. > > I you would like to fix this, please send me a complete fix. If you want to handle the unlikely "__kthread_create_on_node() fails without being killed" case, such change ( the drivers/usb/usbip/vhci_sysfs.c portion in https://syzkaller.appspot.com/x/patch.diff?x=16c3c090d00000 ) should be a separate patch. Since this patch declares "Fixes: 9720b4bc76a83807 ("staging/usbip: convert to kthread")", this patch intends for the minimal change and does not want to do extra things.