Al Viro <viro@xxxxxxxxxxxxxxxxxx> writes: > On Thu, Sep 10, 2020 at 02:26:46PM +0900, Tetsuo Handa wrote: >> Thank you for responding. I'm also waiting for your response on >> "[RFC PATCH] pipe: make pipe_release() deferrable." at >> https://lore.kernel.org/linux-fsdevel/7ba35ca4-13c1-caa3-0655-50d328304462@xxxxxxxxxxxxxxxxxxx/ >> and "[PATCH] splice: fix premature end of input detection" at >> https://lore.kernel.org/linux-block/cf26a57e-01f4-32a9-0b2c-9102bffe76b2@xxxxxxxxxxxxxxxxxxx/ . >> >> > >> > NAK. The reason to defer is *NOT* to bypass that BUG_ON() - we really do not >> > want that thing done on anything other than extremely shallow stack. >> > Incidentally, why is that thing ever done _not_ in a kernel thread context? >> >> What does "that thing" refer to? acct_pin_kill() ? blob_to_mnt() ? >> I don't know the reason because I'm not the author of these functions. > > The latter. What I mean, why not simply do that from inside of > fork_usermode_driver()? Because that is a stupid place to do the work. The usermode driver is currently allowed to die and the kernel be respawned when needed. Which means there is not a 1 to 1 relationship between blob_to_mnt and fork_usermode_driver. As for the current code being racy, it is approxiamtely as racy as the current code to load files init an initrd. AKA no one has ever observed any problems in practice but if you squint you can see where maybe something could happen. I think there is a stronger argument for finding a way to guarantee that flush_delayed_fput will wait until any scheduled delayed_fput_work will complete. As that is the race Tetsuo is complaining about, and it does also appear to also be present in populate_rootfs. Flushing the fput is needed to ensure the writable struct file is completely gone before an exec opens file file and calles deny_write_access. > umd_setup is stored in sub_info->init and > eventually called from call_usermodehelper_exec_async(), right before > the created kernel thread is about to call kernel_execve() and stop > being a kernel thread... I think you are suggesting calling __fput_sync in umd_setup. Instead of calling fput from blob_to_mnt. To have a special case that only applies the first time a function is called is possible but it is awkward, and likely more error prone. I moved all of the user mode driver code out of exec and out of the user mode helper code as the user mode driver code is essentially unused at present. The bpf folks really want to try and make it work so I wrote something that is not completely insane so they can have their chance to try. I really suspect it will go the way of all of the migration of the early kernel init code to userspace with klibc. With the practical details overwhelming things and making it not work or worth it in practice. Time will tell. I hope that is enough context to understand what is going on there. Eric