On Mon, May 18, 2020 at 02:53:25PM +0200, Christian Brauner wrote: > On Sun, May 17, 2020 at 09:02:15AM -0600, Tycho Andersen wrote: > > On Sun, May 17, 2020 at 08:46:03AM -0600, Tycho Andersen wrote: > > > On Sun, May 17, 2020 at 04:33:11PM +0200, Christian Brauner wrote: > > > > struct seccomp_notif2 { > > > > __u32 notif_size; > > > > __u64 id; > > > > __u32 pid; > > > > __u32 flags; > > > > struct seccomp_data data; > > > > __u32 data_size; > > > > }; > > > > > > I guess you need to put data_size before data, otherwise old userspace > > > with a smaller struct seccomp_data will look in the wrong place. > > > > > > But yes, that'll work if you put two sizes in, which is probably > > > reasonable since we're talking about two structs. > > > > Well, no, it doesn't either. Suppose we add a new field first to > > struct seccomp_notif2: > > > > struct seccomp_notif2 { > > __u32 notif_size; > > __u64 id; > > __u32 pid; > > __u32 flags; > > struct seccomp_data data; > > __u32 data_size; > > __u32 new_field; > > }; > > > > And next we add a new field to struct secccomp_data. When a userspace > > compiled with just the new seccomp_notif2 field does: > > > > seccomp_notif2.new_field = ...; > > > > the compiler will put it in the wrong place for the kernel with the > > new seccomp_data field too. > > > > Sort of feels like we should do: > > > > struct seccomp_notif2 { > > struct seccomp_notif *notif; > > struct seccomp_data *data; > > }; > > > > ? > > Oh yes of course, sorry that was my stupid typo. I meant: > > struct seccomp_notif2 { > __u32 notif_size; > __u64 id; > __u32 pid; > __u32 flags; > struct seccomp_data *data; > __u32 data_size; > __u32 new_field; > } One big difference in the approach I described is that the user gets to ask for specific fields, and can configure the listener upfront, versus having to do the dance of fetching the sizes, and dynamically allocating memory. This way userspace can just do on-stack static allocations. We can get rid of the kbuf bits in my PR, if we incrementally fill up the userspace buffer (copy_to_user shouldn't be *that* costly). In addition, we're not copying a bunch of unnecessary data, or calculating values that the user may not be interested in. This is particularly valuable if we ever want to do things like passing optional fields (think PIDs) back. Code-wise, it looks something like: struct read_output_format { __u64 id; __u32 tgid; struct seccomp_data data; } __packed; TEST(user_notification_read) { long ret; int status, pid, listener, read_size; struct seccomp_notif_config config = {}; struct seccomp_notif_resp resp = {}; struct read_output_format buf; ret = prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0); ASSERT_EQ(0, ret) { TH_LOG("Kernel does not support PR_SET_NO_NEW_PRIVS!"); } listener = user_trap_syscall(__NR_dup, SECCOMP_FILTER_FLAG_NEW_LISTENER); ASSERT_GE(listener, 0); EXPECT_EQ(read(listener, &buf, sizeof(buf)), -1) { EXPECT_EQ(errno, -EINVAL); } config.size = sizeof(config); config.seccomp_data_size = sizeof(struct seccomp_data); config.optional_fields = ~(0); /* Make sure invalid fields are not accepted */ ASSERT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_CONFIG, &config), -1); config.optional_fields = SECCOMP_NOTIF_FIELD_TGID; ASSERT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_CONFIG, &config), sizeof(buf)); pid = fork(); ASSERT_GE(pid, 0); if (pid == 0) exit(syscall(__NR_dup, 42, 1, 1, 1) != USER_NOTIF_MAGIC); /* Passing a smaller value in should fail */ EXPECT_EQ(read(listener, &buf, read_size - 1), -1) { EXPECT_EQ(errno, -E2BIG); } /* Passing a larger value in should succeed */ ASSERT_EQ(read(listener, &buf, 200), sizeof(buf)); EXPECT_EQ(buf.tgid, pid); EXPECT_EQ(buf.data.args[0], 42); EXPECT_EQ(buf.data.nr, __NR_dup); resp.id = buf.id; resp.error = 0; resp.val = USER_NOTIF_MAGIC; EXPECT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_SEND, &resp), 0); EXPECT_EQ(waitpid(pid, &status, 0), pid); EXPECT_EQ(true, WIFEXITED(status)); EXPECT_EQ(0, WEXITSTATUS(status)); } > > at which point things should just work imho. This is similar to how the > set_tid array works. The kernel doesn't need to allocate any more too. > The kernel can just always use the currently know seccomp_data size. > If the kernel supports _less_ than what the caller expects, it can > report the supported size in data_size to userspace returning EINVAL. If > it supports more then it can just copy the known fields, I guess. > > This way we don't need to add yet another ioctl... > > Christian