On Tue, Feb 04, 2025 at 10:57:39AM -0500, Jeff Layton wrote: > On Tue, 2025-02-04 at 16:09 +0100, Christian Brauner wrote: > > On Tue, Feb 04, 2025 at 07:28:20AM -0500, Jeff Layton wrote: > > > On Tue, 2025-02-04 at 12:07 +0100, Christian Brauner wrote: > > > > On Mon, Feb 03, 2025 at 12:09:48PM -0500, Jeff Layton wrote: > > > > > Some of the fields in the statmount() reply can be optional. If the > > > > > kernel has nothing to emit in that field, then it doesn't set the flag > > > > > in the reply. This presents a problem: There is currently no way to > > > > > know what mask flags the kernel supports since you can't always count on > > > > > them being in the reply. > > > > > > > > > > Add a new STATMOUNT_SUPPORTED_MASK flag and field that the kernel can > > > > > set in the reply. Userland can use this to determine if the fields it > > > > > requires from the kernel are supported. This also gives us a way to > > > > > deprecate fields in the future, if that should become necessary. > > > > > > > > > > Signed-off-by: Jeff Layton <jlayton@xxxxxxxxxx> > > > > > --- > > > > > I ran into this problem recently. We have a variety of kernels running > > > > > that have varying levels of support of statmount(), and I need to be > > > > > able to fall back to /proc scraping if support for everything isn't > > > > > present. This is difficult currently since statmount() doesn't set the > > > > > flag in the return mask if the field is empty. > > > > > --- > > > > > fs/namespace.c | 18 ++++++++++++++++++ > > > > > include/uapi/linux/mount.h | 4 +++- > > > > > 2 files changed, 21 insertions(+), 1 deletion(-) > > > > > > > > > > diff --git a/fs/namespace.c b/fs/namespace.c > > > > > index a3ed3f2980cbae6238cda09874e2dac146080eb6..7ec5fc436c4ff300507c4ed71a757f5d75a4d520 100644 > > > > > --- a/fs/namespace.c > > > > > +++ b/fs/namespace.c > > > > > @@ -5317,6 +5317,21 @@ static int grab_requested_root(struct mnt_namespace *ns, struct path *root) > > > > > return 0; > > > > > } > > > > > > > > > > +/* This must be updated whenever a new flag is added */ > > > > > +#define STATMOUNT_SUPPORTED (STATMOUNT_SB_BASIC | \ > > > > > + STATMOUNT_MNT_BASIC | \ > > > > > + STATMOUNT_PROPAGATE_FROM | \ > > > > > + STATMOUNT_MNT_ROOT | \ > > > > > + STATMOUNT_MNT_POINT | \ > > > > > + STATMOUNT_FS_TYPE | \ > > > > > + STATMOUNT_MNT_NS_ID | \ > > > > > + STATMOUNT_MNT_OPTS | \ > > > > > + STATMOUNT_FS_SUBTYPE | \ > > > > > + STATMOUNT_SB_SOURCE | \ > > > > > + STATMOUNT_OPT_ARRAY | \ > > > > > + STATMOUNT_OPT_SEC_ARRAY | \ > > > > > + STATMOUNT_SUPPORTED_MASK) > > > > > > > > Hm, do we need a separate bit for STATMOUNT_SUPPORTED_MASK? Afaiu, this > > > > is more of a convenience thing but then maybe we just do: > > > > > > > > #define STATMOUNT_SUPPORTED_MASK STATMOUNT_MNT_BASIC > > > > > > > > and be done with it? > > > > > > > > Otherwise I think it is worth having support for this. > > > > > > > > > > Are you suggesting that we should just add the ->supported_mask field > > > without a declaring a bit for it? If so, how would that work on old > > > kernels? You'd never know if you could trust the contents of that field > > > since the return mask wouldn't indicate any difference. > > > > What I didn't realize because I hadn't read carefully enough in your > > patch was that STATMOUNT_SUPPORTED_MASK is raised in ->mask and only > > then is ->supported_mask filled in. > > > > My thinking had been that ->supported_mask will simply always be filled > > in by the kernel going forward. Which is arguably not ideal but would > > work: > > > > So the kernel guarantees since the introduction of statmount() that when > > we copy out to userspace that anything the kernel doesn't know will be > > copied back zeroed. So any unknown fields are zero. > > > > (1) Say userspace passes a struct statmount with ->supported_mask to the > > kernel - even if it has put garbage in there or intentionally raised > > valid flags in there - the old kernel will copy over this and set it > > to zero. > > > > (2) If you're on a new kernel but pass an old struct the kernel will > > fill in ->supported_mask. Imho, that's fine. Userspace simply will > > not know about it. > > > > But we can leave the explicit request in! > > > > > I can respin without STATMOUNT_SUPPORTED_MASK. I was thinking it left > that part of the return buffer untouched, but if it's zeroed, then that > works as well. > > If you see a supported_mask of 0, you know the kernel didn't fill it in > (since it should at least support _something_). That'll need to be > carefully documented though. It's probably easier for userspace if that flag must be specifically raised.