On Thu, Jul 9, 2020 at 1:26 PM Kees Cook <keescook@xxxxxxxxxxxx> wrote: > > From: Sargun Dhillon <sargun@xxxxxxxxx> > > The current SECCOMP_RET_USER_NOTIF API allows for syscall supervision over > an fd. It is often used in settings where a supervising task emulates > syscalls on behalf of a supervised task in userspace, either to further > restrict the supervisee's syscall abilities or to circumvent kernel > enforced restrictions the supervisor deems safe to lift (e.g. actually > performing a mount(2) for an unprivileged container). > > While SECCOMP_RET_USER_NOTIF allows for the interception of any syscall, > only a certain subset of syscalls could be correctly emulated. Over the > last few development cycles, the set of syscalls which can't be emulated > has been reduced due to the addition of pidfd_getfd(2). With this we are > now able to, for example, intercept syscalls that require the supervisor > to operate on file descriptors of the supervisee such as connect(2). > > However, syscalls that cause new file descriptors to be installed can not > currently be correctly emulated since there is no way for the supervisor > to inject file descriptors into the supervisee. This patch adds a > new addfd ioctl to remove this restriction by allowing the supervisor to > install file descriptors into the intercepted task. By implementing this > feature via seccomp the supervisor effectively instructs the supervisee > to install a set of file descriptors into its own file descriptor table > during the intercepted syscall. This way it is possible to intercept > syscalls such as open() or accept(), and install (or replace, like > dup2(2)) the supervisor's resulting fd into the supervisee. One > replacement use-case would be to redirect the stdout and stderr of a > supervisee into log file descriptors opened by the supervisor. > > The ioctl handling is based on the discussions[1] of how Extensible > Arguments should interact with ioctls. Instead of building size into > the addfd structure, make it a function of the ioctl command (which > is how sizes are normally passed to ioctls). To support forward and > backward compatibility, just mask out the direction and size, and match > everything. The size (and any future direction) checks are done along > with copy_struct_from_user() logic. > > As a note, the seccomp_notif_addfd structure is laid out based on 8-byte > alignment without requiring packing as there have been packing issues > with uapi highlighted before[2][3]. Although we could overload the > newfd field and use -1 to indicate that it is not to be used, doing > so requires changing the size of the fd field, and introduces struct > packing complexity. > > [1]: https://lore.kernel.org/lkml/87o8w9bcaf.fsf@xxxxxxxxxxxxxxxxx/ > [2]: https://lore.kernel.org/lkml/a328b91d-fd8f-4f27-b3c2-91a9c45f18c0@xxxxxxxxxxxxxxxxxx/ > [3]: https://lore.kernel.org/lkml/20200612104629.GA15814@ircssh-2.c.rugged-nimbus-611.internal > > Suggested-by: Matt Denton <mpdenton@xxxxxxxxxx> > Link: https://lore.kernel.org/r/20200603011044.7972-4-sargun@xxxxxxxxx > Signed-off-by: Sargun Dhillon <sargun@xxxxxxxxx> > Co-developed-by: Kees Cook <keescook@xxxxxxxxxxxx> > Signed-off-by: Kees Cook <keescook@xxxxxxxxxxxx> Reviewed-by: Will Drewry <wad@xxxxxxxxxxxx>