On Sun, May 17, 2020 at 12:17 AM Kees Cook <keescook@xxxxxxxxxxxx> wrote: > > On Fri, May 15, 2020 at 04:40:05PM -0700, Sargun Dhillon wrote: > > This includes the thread group leader ID in the seccomp_notif. This is > > immediately useful for opening up a pidfd for the group leader, as > > pidfds only work on group leaders. > > > > Previously, it was considered to include an actual pidfd in the > > seccomp_notif structure[1], but it was suggested to avoid proliferating > > mechanisms to create pidfds[2]. > > > > [1]: https://lkml.org/lkml/2020/1/24/133 > > [2]: https://lkml.org/lkml/2020/5/15/481 > > nit: please use lore.kernel.org/lkml/ URLs > > > Suggested-by: Christian Brauner <christian.brauner@xxxxxxxxxx> > > Signed-off-by: Sargun Dhillon <sargun@xxxxxxxxx> > > --- > > include/uapi/linux/seccomp.h | 2 + > > kernel/seccomp.c | 1 + > > tools/testing/selftests/seccomp/seccomp_bpf.c | 50 +++++++++++++++++++ > > 3 files changed, 53 insertions(+) > > > > diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h > > index c1735455bc53..f0c272ef0f1e 100644 > > --- a/include/uapi/linux/seccomp.h > > +++ b/include/uapi/linux/seccomp.h > > @@ -75,6 +75,8 @@ struct seccomp_notif { > > __u32 pid; > > __u32 flags; > > struct seccomp_data data; > > + __u32 tgid; > > + __u8 pad0[4]; > > }; > > I think we need to leave off padding and instead use __packed. If we > don't then userspace can't tell when "pad0" changes its "meaning" (i.e. > the size of seccomp_notif becomes 88 bytes with above -- either via > explicit padding like you've got or via implicit by the compiler. If > some other u32 gets added in the future, user space will still see "88" > as the size. > I've had previous feedback about using "packed". See: https://lore.kernel.org/lkml/87o8w9bcaf.fsf@xxxxxxxxxxxxxxxxx/ https://lore.kernel.org/lkml/a328b91d-fd8f-4f27-b3c2-91a9c45f18c0@xxxxxxxxxxxxxxxxxx/ > So I *think* the right change here is: > > -}; > + __u32 tgid; > +} __packed; > > Though tgid may need to go above seccomp_data... for when it grows. > Agh... (How) can seccomp_data grow safely, even with this extensibility mechanism? > > _However_, unfortunately, I appear to have no thought this through very > well, and there is actually no sanity-checking in the kernel for dealing > with an old userspace when sizes change. :( For example, if a userspace > doesn't check sizes and calls an ioctl, etc, the kernel will clobber the > user buffer if it's too small. > > Even the SECCOMP_GET_NOTIF_SIZES command lacks a buffer size argument. > :( > > So: > > - should we just declare such userspace as "wrong"? I don't think > that'll work, especially since what if we ever change the size of > seccomp_data... that predated the ..._SIZES command. > > - should we add a SECCOMP_SET_SIZES command to tell the kernel what > we're expecting? There's no "state" associated across seccomp(2) > calls, but maybe that doesn't matter because only user_notif writes > back to userspace. For the ioctl, the state could be part of the > private file data? Sending seccomp_data back to userspace only > happens here, and any changes in seccomp_data size will just be seen > as allowing a filter to query further into it. Will we ever grow seccomp_data? I suggest we throw away the _SIZES api, and just introduce RECV2, which sends back a known, fixed format, and deprecate these dynamically sized uapi shenanigans. (Queue RECV3, etc..) Maybe we do something like perf_event_open, where there's a read_format, and that's used by the user to determine how big of a response / fields they want to get? > > - should GET_SIZES report "useful" size? (i.e. exclude padding?) > > > diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c b/tools/testing/selftests/seccomp/seccomp_bpf.c > > Yay test updates! :) > > > +TEST(user_notification_groupleader) > > In my first pass of review I was going to say "can you please also check > the sizes used by the ioctl?" But that triggered the above size checking > mess in my mind. > > Let me look at this more closely on Monday, and I'll proposed something. > :P To summarize my set of ideas: 1. We take the ptrace-style API, where we have a request to get the tgid of a given request ID (or any new / extensible field) 2. We add a perf_event_open style API, where you have to tell it what fields to include in the response 3. We introduce RECV2 [through N] 4. We never extend seccomp_data, and just continue to append things to the API 5. We rev the API _once_ and unroll seccomp_data, and make it so that new members have to be *asked for*, rather than are implicitly included. > > Thanks! > > -Kees > > -- > Kees Cook _______________________________________________ Containers mailing list Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linuxfoundation.org/mailman/listinfo/containers