On Tue, Mar 11, 2025 at 04:55:16PM +0100, Christian Brauner wrote: > On Tue, Mar 11, 2025 at 09:01:36AM -0400, James Bottomley wrote: > > On Tue, 2025-03-11 at 09:45 +0100, Christian Brauner wrote: > > > On Tue, Mar 11, 2025 at 08:16:34AM +0100, Ard Biesheuvel wrote: > > > > (cc Al Viro) > > > > > > > > On Mon, 10 Mar 2025 at 22:49, James Bottomley > > > > <James.Bottomley@xxxxxxxxxxxxxxxxxxxxx> wrote: > > [...] > > > > > The problem comes down to the superblock functions not being able > > > > > to get the struct vfsmount for the superblock (because it isn't > > > > > even allocated until after they've all been called). The > > > > > assumption I was operating under was that provided I added > > > > > O_NOATIME to prevent the parent directory being updated, passing > > > > > in a NULL mnt for the purposes of iterating the directory dentry > > > > > was safe. What apparmour is trying to do is look up the idmap > > > > > for the mount point to do one of its checks. > > > > > > > > > > There are two ways of fixing this that I can think of. One would > > > > > be exporting a function that lets me dig the vfsmount out of > > > > > s_mounts and use that (it's well hidden in the internals of > > > > > fs/mount.h, so I suspect this might not be very acceptable) or to > > > > > get mnt_idmap to return > > > > > > Nope, please don't. > > > > > > > > &nop_mnt_idmap if the passed in mnt is NULL. I'd lean towards > > > > > the latter, but I'm cc'ing fsdevel to see what others think. > > > > > > A struct path with mount NULL and dentry != NULL is guaranteed to bit > > > us in the ass in other places. That's the bug. > > > > > > > > > > > > > > > > > > > > Al spotted the same issue based on a syzbot report [0] > > > > > > > > [0] https://lore.kernel.org/all/20250310235831.GL2023217@ZenIV/ > > > > > > efivars as written only has a single global superblock and it doesn't > > > support idmapped mounts and I don't see why it ever would. > > > > So that's not quite true: efivarfs currently supports uid and gid > > mapping as mount options, which certainly looks like they were designed > > to allow a second mount in a user directory. I've no idea what the > > actual use case for this is, but if I go for a single superblock, any > > reconfigure with new uid/gid would become globally effective (change > > every current mount) because they're stored in the superblock > > information. > > > > So what is the use case for this uid/gid parameter? If no-one can > > remember and no-one actually uses it, perhaps the whole config path can > > be simplified by getting rid of the options? Even if there is a use > > case, if it's single mount only then we can still go with a global > > superblock. > > So efivarfs uses get_tree_single(). That means that only a single > superblock of the filesystem type efivarfs can ever exist on the system. > > If efivars is mounted multiple times it will be the exact same > superblock that's used. IOW, mounting efivars multiple times is akin to > a bind-mount. It would be a bit ugly but it could be done by making sure > that any uid/gid changes are reflected. But see below. > > > > > > But since efivars does only ever have a single global superblock, one > > > possibility is to an internal superblock that always exits and is > > > resurfaced whenever userspace mounts efivarfs. That's essentially the > > > devtmpfs model. > > > > > > Then you can stash: > > > > > > static struct vfsmount *efivarfs_mnt; > > > > > > globally and use that in efivarfs_pm_notify() to fill in struct path. > > > > I didn't see devtmpfs when looking for examples, since it's hiding > > outside of the fs/ directory. However, it does seem to be a bit legacy > > nasty as an example to copy. However, I get the basics: we'd > > instantiate the mnt and superblock on init (stashing mnt in the sfi so > > the notifier gets it). Then we can do the variable population on > > reconfigure, just in case an EFI system doesn't want to mount efivarfs > > to save memory. > > > > I can code that up if I can get an answer to the uid/gid parameter > > question above. > > I have some questions. efivarfs registers efivarfs_pm_notify even before > a superblock exists in efivarfs_init_fs_context(). That's called during > fd_context = fsopen("efivarfs") before a superblock even exists: > > (1) Is it guaranteed that efivarfs_pm_notify() is only called once a > superblock exists? > > (2) Is it guaranteed that efivarfs_pm_notify() is only called when and > while a mount for the superblock exists? > > If the question to either one of those is "no" then the global > vfsmount hack will not help. > > >From reading efivarfs_pm_notify() it looks like the answer to (1) is > "yes" because you're dereferencing sfi->sb->s_root in > efivarfs_pm_notify(). > > But I'm not at all certain that (2) isn't a "no" and that > efivarfs_pm_notify() can be called before a mount exists. IOW, once > fsconfig(FSCONFIG_CMD_CREATE) is called the notifier seems ready and > registered but userspace isn't forced to call fsmount(fd_fs) at all. > > They could just not to do it for whatever reason but the notifier should > already be able to run. > > Another question is whether the superblock can be freed while > efivarfs_pm_notify() is running? I think that can't happen because > blocking_notifier_chain_unregister(&efivar_ops_nh, &sfi->nb) will block > in efivarfs_kill_sb() until all outstanding calls to > efivarfs_pm_notify() are finished? > > If (2) isn't guranteed then efivarfs_pm_notify() needs to be rewritten > without relying on files because there's no guarantee that a mount > exists at all. Btw, sorry but I'm going to be able to (guarantee) to respond Wed - Fri because I'm on kid-watch-duty.