On Sun, May 17, 2020 at 12:17:14AM -0700, Kees Cook wrote: > On Fri, May 15, 2020 at 04:40:05PM -0700, Sargun Dhillon wrote: > > This includes the thread group leader ID in the seccomp_notif. This is > > immediately useful for opening up a pidfd for the group leader, as > > pidfds only work on group leaders. > > > > Previously, it was considered to include an actual pidfd in the > > seccomp_notif structure[1], but it was suggested to avoid proliferating > > mechanisms to create pidfds[2]. > > > > [1]: https://lkml.org/lkml/2020/1/24/133 > > [2]: https://lkml.org/lkml/2020/5/15/481 > > nit: please use lore.kernel.org/lkml/ URLs > > > Suggested-by: Christian Brauner <christian.brauner@xxxxxxxxxx> > > Signed-off-by: Sargun Dhillon <sargun@xxxxxxxxx> > > --- > > include/uapi/linux/seccomp.h | 2 + > > kernel/seccomp.c | 1 + > > tools/testing/selftests/seccomp/seccomp_bpf.c | 50 +++++++++++++++++++ > > 3 files changed, 53 insertions(+) > > > > diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h > > index c1735455bc53..f0c272ef0f1e 100644 > > --- a/include/uapi/linux/seccomp.h > > +++ b/include/uapi/linux/seccomp.h > > @@ -75,6 +75,8 @@ struct seccomp_notif { > > __u32 pid; > > __u32 flags; > > struct seccomp_data data; > > + __u32 tgid; > > + __u8 pad0[4]; > > }; > > I think we need to leave off padding and instead use __packed. If we > don't then userspace can't tell when "pad0" changes its "meaning" (i.e. > the size of seccomp_notif becomes 88 bytes with above -- either via > explicit padding like you've got or via implicit by the compiler. If > some other u32 gets added in the future, user space will still see "88" > as the size. > > So I *think* the right change here is: > > -}; > + __u32 tgid; > +} __packed; > > Though tgid may need to go above seccomp_data... for when it grows. > Agh... > > _However_, unfortunately, I appear to have no thought this through very > well, and there is actually no sanity-checking in the kernel for dealing > with an old userspace when sizes change. :( For example, if a userspace > doesn't check sizes and calls an ioctl, etc, the kernel will clobber the > user buffer if it's too small. I'd just argue that that's just userspace messing up. > > Even the SECCOMP_GET_NOTIF_SIZES command lacks a buffer size argument. > :( > > So: > > - should we just declare such userspace as "wrong"? I don't think > that'll work, especially since what if we ever change the size of > seccomp_data... that predated the ..._SIZES command. Yeah, that's nasty since the struct member in seccomp_notif would now clobber each other. > > - should we add a SECCOMP_SET_SIZES command to tell the kernel what > we're expecting? There's no "state" associated across seccomp(2) > calls, but maybe that doesn't matter because only user_notif writes > back to userspace. For the ioctl, the state could be part of the > private file data? Sending seccomp_data back to userspace only > happens here, and any changes in seccomp_data size will just be seen > as allowing a filter to query further into it. Sounds ok-ish in my opinion. > > - should GET_SIZES report "useful" size? (i.e. exclude padding?) Or... And that's more invasive but ultimately cleaner we v2 the whole thing so e.g. SECCOMP_IOCTL_NOTIF_RECV2, SECCOMP_IOCTL_NOTIF_SEND2, and embedd the size argument in the structs. Userspace sets the size argument, we use get_user() to get the size first and then copy_struct_from_user() to handle it cleanly based on that. A similar model as with sched (has other unrelated quirks because they messed up something too): static int sched_copy_attr(struct sched_attr __user *uattr, struct sched_attr *attr) { u32 size; int ret; /* Zero the full structure, so that a short copy will be nice: */ memset(attr, 0, sizeof(*attr)); ret = get_user(size, &uattr->size); if (ret) return ret; /* ABI compatibility quirk: */ if (!size) size = SCHED_ATTR_SIZE_VER0; if (size < SCHED_ATTR_SIZE_VER0 || size > PAGE_SIZE) goto err_size; ret = copy_struct_from_user(attr, sizeof(*attr), uattr, size); if (ret) { if (ret == -E2BIG) goto err_size; return ret; } We're probably the biggest user of this right now and I'd be ok with that change. If it's a v2 than whatever. :) Christian