Re: [RFC 1/1] fix NULL mnt [was Re: apparmor NULL pointer dereference on resume [efivarfs]]

Ard Biesheuvel <ardb@xxxxxxxxxx> · Sun, 16 Mar 2025 20:19:48 +0100

On Sun, 16 Mar 2025 at 15:26, James Bottomley
<James.Bottomley@xxxxxxxxxxxxxxxxxxxxx> wrote:
>
> On Sun, 2025-03-16 at 07:46 +0100, Christian Brauner wrote:
> > On Sat, Mar 15, 2025 at 02:41:43PM -0400, James Bottomley wrote:
> [...]
> > > However, there's another problem: the mntput after kernel_file_open
> > > may synchronously call cleanup_mnt() (and thus deactivate_super())
> > > if the open fails because it's marked MNT_INTERNAL, which is caused
> > > by SB_KERNMOUNT.  I fixed this just by not passing the SB_KERNMOUNT
> > > flag, which feels a bit hacky.
> >
> > It actually isn't. We know that vfs_kern_mount() will always
> > resurface the single superblock that's exposed to userspace because
> > we've just taken a reference to it earlier in efivarfs_pm_notify().
> > So that SB_KERNMOUNT flag is ignored because no new superblock is
> > allocated. It would only matter if we'd end up allocating a new
> > superblock which we never do.
>
> I agree with the above: fc->sb_flags never propagates to the existing
> superblock.  However, nothing propagates the superblock flags back to
> fc->sb_flags either.  The check in vfs_create_mount() is on fc-
> >sb_flags.  Since the code is a bit hard to follow I added a printk on
> the path.mnt flags and sure enough it comes back with MNT_INTERNAL when
> SB_KERNMOUNT is set.
>
> > And if we did it would be a bug because the superblock we allocate
> > could be reused at any time if a userspace task mounts efivarfs
> > before efivarfs_pm_notify() has destroyed it (or the respective
> > workqueue). But that superblock would then have SB_KERNMOUNT for
> > something that's not supposed to be one.
>
> True, but the flags don't propagate to the superblock, so no bug.
>
> > And whether or not that helper mount has MNT_INTERNAL is immaterial
> > to what you're doing here afaict.
>
> I think the problem is the call chain mntput() -> mntput_no_expire()
> which directly calls cleanup_mnt() -> deactivate_super() if that flag
> is set.  Though I don't see that kernel_file_open() could ever fail
> except for some catastrophic reason like out of memory, so perhaps it
> isn't worth quibbling about.
>
> > So not passing the SB_KERNMOUNT flag is the right thing (see devtmpfs
> > as well). You could slap a comment in here explaining that we never
> > allocate a new superblock so it's clear to people not familiar with
> > this particular code.
>
> OK, so you agree that the code as written looks correct? Even if we
> don't necessarily quite agree on why.
>

Thanks for making progress on this. It would be nice if we could fix
this before the v6.14 release, given that the code in question was
introduced this cycle.

And there's another suggestion from Al, to use inode_lock_nested() to
work around the lockdep warning. I can take care of that one, unless
you prefer to do it yourself?