On Sat, 2025-03-15 at 11:04 +0100, Christian Brauner wrote: [...] > Since efivars uses a single global superblock and we know that sfi- > >sb is still alive (After all we've just pinned it above.) > vfs_kern_mount() will reuse the same superblock. > > There's two cases to consider: > > (1) vfs_kern_mount() was successful. In this case path->mnt will hold > an active superblock reference that will be released asynchronously > via __fput(). That is safe and correct. > > (2) vfs_kern_mount() fails. That's an issue because you need to call > deactivate_super() which will have a similar deadlock problem. > If efivarfs_pm_notify() now holds the last reference to the > superblock then deactivate_super() super will put that last > reference and call efivarfs_kill_super() which in turn will wait for > efivarfs_pm_notify() to finish. => deadlock > > So in the error case you need to offload the call to > deactivate_super() to a workqueue. OK, got that (although it did make my head explode a bit ... this is certainly subtle stuff). To do the delayed work for the deactivate_super(), I hijacked the superblock destroy_work structure which I think is safe because by the time the work structure is executed, we own it and so it can be reused for the actual destroy_work in deactivate_super(). However, there's another problem: the mntput after kernel_file_open may synchronously call cleanup_mnt() (and thus deactivate_super()) if the open fails because it's marked MNT_INTERNAL, which is caused by SB_KERNMOUNT. I fixed this just by not passing the SB_KERNMOUNT flag, which feels a bit hacky. I've put together everything at the bottom, however, I can't test the error legs of this because trying to trigger and unmount and hybernate at exactly the right point is pretty much impossible. The rest seems to work as advertised, although I would like a tested-by from the apparmour people because I do run apparmour in my debian test rig but don't see the problem. Regards, James --- diff --git a/fs/efivarfs/super.c b/fs/efivarfs/super.c index 6eae8cf655c1..2d826e98066b 100644 --- a/fs/efivarfs/super.c +++ b/fs/efivarfs/super.c @@ -474,12 +474,25 @@ static int efivarfs_check_missing(efi_char16_t *name16, efi_guid_t vendor, return err; } +static void efivarfs_deactivate_super_work(struct work_struct *work) +{ + struct super_block *s = container_of(work, struct super_block, + destroy_work); + /* + * note: here s->destroy_work is free for reuse (which + * will happen in deactivate_super) + */ + deactivate_super(s); +} + +static struct file_system_type efivarfs_type; + static int efivarfs_pm_notify(struct notifier_block *nb, unsigned long action, void *ptr) { struct efivarfs_fs_info *sfi = container_of(nb, struct efivarfs_fs_info, pm_nb); - struct path path = { .mnt = NULL, .dentry = sfi->sb->s_root, }; + struct path path; struct efivarfs_ctx ectx = { .ctx = { .actor = efivarfs_actor, @@ -487,6 +500,7 @@ static int efivarfs_pm_notify(struct notifier_block *nb, unsigned long action, .sb = sfi->sb, }; struct file *file; + struct super_block *s = sfi->sb; static bool rescan_done = true; if (action == PM_HIBERNATION_PREPARE) { @@ -499,11 +513,39 @@ static int efivarfs_pm_notify(struct notifier_block *nb, unsigned long action, if (rescan_done) return NOTIFY_DONE; + /* ensure single superblock is alive and pin it */ + if (!atomic_inc_not_zero(&s->s_active)) + return NOTIFY_DONE; + pr_info("efivarfs: resyncing variable state\n"); - /* O_NOATIME is required to prevent oops on NULL mnt */ + path.dentry = sfi->sb->s_root; + + /* do not add SB_KERNMOUNT which causes MNT_INTERNAL, see below */ + path.mnt = vfs_kern_mount(&efivarfs_type, 0, + efivarfs_type.name, NULL); + if (IS_ERR(path.mnt)) { + pr_err("efivarfs: internal mount failed\n"); + /* + * We may be the last pinner of the superblock but + * calling efivarfs_kill_sb from within the notifier + * here would deadlock trying to unregister it + */ + INIT_WORK(&s->destroy_work, efivarfs_deactivate_super_work); + schedule_work(&s->destroy_work); + return PTR_ERR(path.mnt); + } + + /* path.mnt now has pin on superblock, so this must be above one */ + atomic_dec(&s->s_active); + file = kernel_file_open(&path, O_RDONLY | O_DIRECTORY | O_NOATIME, current_cred()); + /* + * safe even if last put because no MNT_INTERNAL means this + * will do delayed deactivate_super and not deadlock + */ + mntput(path.mnt); if (IS_ERR(file)) return NOTIFY_DONE;