Re: [RFC 1/1] fix NULL mnt [was Re: apparmor NULL pointer dereference on resume [efivarfs]]

Christian Brauner <brauner@xxxxxxxxxx> · Mon, 17 Mar 2025 09:56:54 +0100

On Sun, Mar 16, 2025 at 10:26:12AM -0400, James Bottomley wrote:
> On Sun, 2025-03-16 at 07:46 +0100, Christian Brauner wrote:
> > On Sat, Mar 15, 2025 at 02:41:43PM -0400, James Bottomley wrote:
> [...]
> > > However, there's another problem: the mntput after kernel_file_open
> > > may synchronously call cleanup_mnt() (and thus deactivate_super())
> > > if the open fails because it's marked MNT_INTERNAL, which is caused
> > > by SB_KERNMOUNT.  I fixed this just by not passing the SB_KERNMOUNT
> > > flag, which feels a bit hacky.
> > 
> > It actually isn't. We know that vfs_kern_mount() will always
> > resurface the single superblock that's exposed to userspace because
> > we've just taken a reference to it earlier in efivarfs_pm_notify().
> > So that SB_KERNMOUNT flag is ignored because no new superblock is
> > allocated. It would only matter if we'd end up allocating a new
> > superblock which we never do.
> 
> I agree with the above: fc->sb_flags never propagates to the existing
> superblock.  However, nothing propagates the superblock flags back to
> fc->sb_flags either.  The check in vfs_create_mount() is on fc-
> >sb_flags.  Since the code is a bit hard to follow I added a printk on
> the path.mnt flags and sure enough it comes back with MNT_INTERNAL when
> SB_KERNMOUNT is set.
> 
> > And if we did it would be a bug because the superblock we allocate
> > could be reused at any time if a userspace task mounts efivarfs
> > before efivarfs_pm_notify() has destroyed it (or the respective
> > workqueue). But that superblock would then have SB_KERNMOUNT for
> > something that's not supposed to be one.
> 
> True, but the flags don't propagate to the superblock, so no bug.

SB_KERNMOUNT does propagate to the superblock if it is newly allocated
via sget_fc(): alloc_super(fc->fs_type, fc->sb_flags, user_ns);

But you misunderstood. "If we did it" means "If efivarfs_pm_notify()
somehow were to allocate a new superblock (which it doesn't) then having
SB_KERNMOUNT raised on the newly allocated superblock would be bug
because the superblock could be reused by userspace mounting efivars.

So removing it is the correct thing in either case. It's just confusing
to anyone maintaining that code thinking that it'd be possible for a
superblock to resurface with SB_KERNMOUNT.

> 
> > And whether or not that helper mount has MNT_INTERNAL is immaterial
> > to what you're doing here afaict.
> 
> I think the problem is the call chain mntput() -> mntput_no_expire()
> which directly calls cleanup_mnt() -> deactivate_super() if that flag
> is set.  Though I don't see that kernel_file_open() could ever fail
> except for some catastrophic reason like out of memory, so perhaps it
> isn't worth quibbling about.

Not what I'm saying. Not having MNT_INTERNAL is paramount to not
deadlocking but by not having it you're not losing anything.

> 
> > So not passing the SB_KERNMOUNT flag is the right thing (see devtmpfs
> > as well). You could slap a comment in here explaining that we never
> > allocate a new superblock so it's clear to people not familiar with
> > this particular code.
> 
> OK, so you agree that the code as written looks correct? Even if we
> don't necessarily quite agree on why.

We agree but you just misunderstood. :)