On Sun, Jan 23, 2022 at 04:33:41PM -0800, Tong Zhang wrote: > We should unregister the table upon module unload otherwise something > horrible will happen when we load binfmt_misc module again. Also note > that we should keep value returned by register_sysctl_mount_point() and > release it later, otherwise it will leak. > > reproduce: > modprobe binfmt_misc > modprobe -r binfmt_misc > modprobe binfmt_misc > modprobe -r binfmt_misc > modprobe binfmt_misc > > [ 18.032038] Call Trace: > [ 18.032108] <TASK> > [ 18.032169] dump_stack_lvl+0x34/0x44 > [ 18.032273] __register_sysctl_table+0x6f4/0x720 > [ 18.032397] ? preempt_count_sub+0xf/0xb0 > [ 18.032508] ? 0xffffffffc0040000 > [ 18.032600] init_misc_binfmt+0x2d/0x1000 [binfmt_misc] > [ 18.042520] binfmt_misc: Failed to create fs/binfmt_misc sysctl mount point > modprobe: can't load module binfmt_misc (kernel/fs/binfmt_misc.ko): Cannot allocate memory > [ 18.063549] binfmt_misc: Failed to create fs/binfmt_misc sysctl mount point > [ 18.204779] BUG: unable to handle page fault for address: fffffbfff8004802 > > Fixes: 3ba442d5331f ("fs: move binfmt_misc sysctl to its own file") > Signed-off-by: Tong Zhang <ztong0001@xxxxxxxxx> > --- > fs/binfmt_misc.c | 7 ++++++- > 1 file changed, 6 insertions(+), 1 deletion(-) > > diff --git a/fs/binfmt_misc.c b/fs/binfmt_misc.c > index ddea6acbddde..614aedb8ab2e 100644 > --- a/fs/binfmt_misc.c > +++ b/fs/binfmt_misc.c > @@ -817,12 +817,16 @@ static struct file_system_type bm_fs_type = { > }; > MODULE_ALIAS_FS("binfmt_misc"); > > +static struct ctl_table_header *binfmt_misc_header; > + > static int __init init_misc_binfmt(void) > { > int err = register_filesystem(&bm_fs_type); > if (!err) > insert_binfmt(&misc_format); > - if (!register_sysctl_mount_point("fs/binfmt_misc")) { > + > + binfmt_misc_header = register_sysctl_mount_point("fs/binfmt_misc"); > + if (!binfmt_misc_header) { The fix itself is obviously needed. However, afaict the previous patch introduced another bug and this patch right here doesn't fix it either. Namely, if you set CONFIG_SYSCTL=n and CONFIG_BINFMT_MISC={y,m}, then register_sysctl_mount_point() will return NULL causing modprobe binfmt_misc to fail. However, before 3ba442d5331f ("fs: move binfmt_misc sysctl to its own file") loading binfmt_misc would've succeeded even if fs/binfmt_misc wasn't created in kernel/sysctl.c. Afaict, that goes for both CONFIG_SYSCTL={y,n} since even in the CONFIG_SYSCTL=y case the kernel would've moved on if creating the sysctl header would've failed. And that makes sense since binfmt_misc is mountable wherever, not just at fs/binfmt_misc. All that indicates that the correct fix here would be to simply: binfmt_misc_header = register_sysctl_mount_point("fs/binfmt_misc"); without checking for an error. That should fully restore the old behavior. > pr_warn("Failed to create fs/binfmt_misc sysctl mount point"); > return -ENOMEM; > } > @@ -831,6 +835,7 @@ static int __init init_misc_binfmt(void) > > static void __exit exit_misc_binfmt(void) > { > + unregister_sysctl_table(binfmt_misc_header); > unregister_binfmt(&misc_format); > unregister_filesystem(&bm_fs_type); > } > -- > 2.25.1 >