On 9/19/19 9:44 AM, Jann Horn wrote: > On Thu, Sep 19, 2019 at 8:54 AM Omar Sandoval <osandov@xxxxxxxxxxx> wrote: >> Btrfs can transparently compress data written by the user. However, we'd >> like to add an interface to write pre-compressed data directly to the >> filesystem. This adds support for so-called "encoded writes" via >> pwritev2(). >> >> A new RWF_ENCODED flags indicates that a write is "encoded". If this >> flag is set, iov[0].iov_base points to a struct encoded_iov which >> contains metadata about the write: namely, the compression algorithm and >> the unencoded (i.e., decompressed) length of the extent. iov[0].iov_len >> must be set to sizeof(struct encoded_iov), which can be used to extend >> the interface in the future. The remaining iovecs contain the encoded >> extent. >> >> A similar interface for reading encoded data can be added to preadv2() >> in the future. >> >> Filesystems must indicate that they support encoded writes by setting >> FMODE_ENCODED_IO in ->file_open(). > [...] >> +int import_encoded_write(struct kiocb *iocb, struct encoded_iov *encoded, >> + struct iov_iter *from) >> +{ >> + if (iov_iter_single_seg_count(from) != sizeof(*encoded)) >> + return -EINVAL; >> + if (copy_from_iter(encoded, sizeof(*encoded), from) != sizeof(*encoded)) >> + return -EFAULT; >> + if (encoded->compression == ENCODED_IOV_COMPRESSION_NONE && >> + encoded->encryption == ENCODED_IOV_ENCRYPTION_NONE) { >> + iocb->ki_flags &= ~IOCB_ENCODED; >> + return 0; >> + } >> + if (encoded->compression > ENCODED_IOV_COMPRESSION_TYPES || >> + encoded->encryption > ENCODED_IOV_ENCRYPTION_TYPES) >> + return -EINVAL; >> + if (!capable(CAP_SYS_ADMIN)) >> + return -EPERM; > > How does this capable() check interact with io_uring? Without having > looked at this in detail, I suspect that when an encoded write is > requested through io_uring, the capable() check might be executed on > something like a workqueue worker thread, which is probably running > with a full capability set. If we can hit -EAGAIN before doing the import in io_uring, then yes, this will probably bypass the check as it'll only happen from the worker. -- Jens Axboe