On Thu, Nov 19, 2020 at 8:03 AM Amir Goldstein <amir73il@xxxxxxxxx> wrote: > On Wed, Nov 18, 2020 at 9:18 PM Omar Sandoval <osandov@xxxxxxxxxxx> wrote: > > The upcoming RWF_ENCODED operation introduces some security concerns: > > > > 1. Compressed writes will pass arbitrary data to decompression > > algorithms in the kernel. > > 2. Compressed reads can leak truncated/hole punched data. > > > > Therefore, we need to require privilege for RWF_ENCODED. It's not > > possible to do the permissions checks at the time of the read or write > > because, e.g., io_uring submits IO from a worker thread. So, add an open > > flag which requires CAP_SYS_ADMIN. It can also be set and cleared with > > fcntl(). The flag is not cleared in any way on fork or exec. It must be > > combined with O_CLOEXEC when opening to avoid accidental leaks (if > > needed, it may be set without O_CLOEXEC by using fnctl()). > > > > Note that the usual issue that unknown open flags are ignored doesn't > > really matter for O_ALLOW_ENCODED; if the kernel doesn't support > > O_ALLOW_ENCODED, then it doesn't support RWF_ENCODED, either. [...] > > diff --git a/fs/open.c b/fs/open.c > > index 9af548fb841b..f2863aaf78e7 100644 > > --- a/fs/open.c > > +++ b/fs/open.c > > @@ -1040,6 +1040,13 @@ inline int build_open_flags(const struct open_how *how, struct open_flags *op) > > acc_mode = 0; > > } > > > > + /* > > + * O_ALLOW_ENCODED must be combined with O_CLOEXEC to avoid accidentally > > + * leaking encoded I/O privileges. > > + */ > > + if ((how->flags & (O_ALLOW_ENCODED | O_CLOEXEC)) == O_ALLOW_ENCODED) > > + return -EINVAL; > > + > > > dup() can also result in accidental leak. > We could fail dup() of fd without O_CLOEXEC. Should we? > > If we should than what error code should it be? We could return EPERM, > but since we do allow to clear O_CLOEXEC or set O_ALLOW_ENCODED > after open, EPERM seems a tad harsh. > EINVAL seems inappropriate because the error has nothing to do with > input args of dup() and EBADF would also be confusing. This seems very arbitrary to me. Sure, leaking these file descriptors wouldn't be great, but there are plenty of other types of file descriptors that are probably more sensitive. (Writable file descriptors to databases, to important configuration files, to io_uring instances, and so on.) So I don't see why this specific feature should impose such special rules on it.