On 1.09.21 г. 20:01, Omar Sandoval wrote: > From: Omar Sandoval <osandov@xxxxxx> > > There are 4 main cases: > > 1. Inline extents: we copy the data straight out of the extent buffer. > 2. Hole/preallocated extents: we fill in zeroes. > 3. Regular, uncompressed extents: we read the sectors we need directly > from disk. > 4. Regular, compressed extents: we read the entire compressed extent > from disk and indicate what subset of the decompressed extent is in > the file. > > This initial implementation simplifies a few things that can be improved > in the future: > > - We hold the inode lock during the operation. > - Cases 1, 3, and 4 allocate temporary memory to read into before > copying out to userspace. > - We don't do read repair, because it turns out that read repair is > currently broken for compressed data. > > Signed-off-by: Omar Sandoval <osandov@xxxxxx> > --- > fs/btrfs/ctree.h | 4 + > fs/btrfs/inode.c | 489 +++++++++++++++++++++++++++++++++++++++++++++++ > fs/btrfs/ioctl.c | 111 +++++++++++ > 3 files changed, 604 insertions(+) > > diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h > index b95ec5fb68d5..cbd7e07c1c34 100644 > --- a/fs/btrfs/ctree.h > +++ b/fs/btrfs/ctree.h > @@ -3223,6 +3223,10 @@ int btrfs_writepage_cow_fixup(struct page *page); > void btrfs_writepage_endio_finish_ordered(struct btrfs_inode *inode, > struct page *page, u64 start, > u64 end, bool uptodate); > +struct btrfs_ioctl_encoded_io_args; > +ssize_t btrfs_encoded_read(struct kiocb *iocb, struct iov_iter *iter, > + struct btrfs_ioctl_encoded_io_args *encoded); > + > extern const struct dentry_operations btrfs_dentry_operations; > extern const struct iomap_ops btrfs_dio_iomap_ops; > extern const struct iomap_dio_ops btrfs_dio_ops; > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c > index a87a34f56234..1940f22179ba 100644 > --- a/fs/btrfs/inode.c > +++ b/fs/btrfs/inode.c > @@ -10500,6 +10500,495 @@ void btrfs_set_range_writeback(struct btrfs_inode *inode, u64 start, u64 end) > } > } > <snip> > + > +static blk_status_t btrfs_encoded_read_check_bio(struct btrfs_io_bio *io_bio) nit: The gist of this function is to check the csum so how about renaming it to btrfs_encoded_read_verify_csum <snip> > + > +static void btrfs_encoded_read_endio(struct bio *bio) > +{ > + struct btrfs_encoded_read_private *priv = bio->bi_private; > + struct btrfs_io_bio *io_bio = btrfs_io_bio(bio); > + blk_status_t status; > + > + status = btrfs_encoded_read_check_bio(io_bio); > + if (status) { > + /* > + * The memory barrier implied by the atomic_dec_return() here > + * pairs with the memory barrier implied by the > + * atomic_dec_return() or io_wait_event() in nit: I think atomic_dec_return in read_regular_fill_pages is inconsequential, what we want to ensure is that when the caller of io_wait_event is woken up by this thread it will observe the priv->status, which it will, because the atomic-dec_return in this function has paired with the general barrier interpolated by wait_event. So for brevity just leave the text to say "by io_wait_event". > + * btrfs_encoded_read_regular_fill_pages() to ensure that this > + * write is observed before the load of status in > + * btrfs_encoded_read_regular_fill_pages(). > + */ > + WRITE_ONCE(priv->status, status); > + } > + if (!atomic_dec_return(&priv->pending)) > + wake_up(&priv->wait); > + btrfs_io_bio_free_csum(io_bio); > + bio_put(bio); > +} <snip> > @@ -4824,6 +4841,94 @@ static int _btrfs_ioctl_send(struct file *file, void __user *argp, bool compat) > return ret; > } > <snip> > + memset((char *)&args + copy_end_kernel, 0, > + sizeof(args) - copy_end_kernel); nit: This memset can be eliminated ( in source) by marking args = {}; and just leaving copy from user above. > + > + ret = import_iovec(READ, args.iov, args.iovcnt, ARRAY_SIZE(iovstack), > + &iov, &iter); > + if (ret < 0) > + goto out_acct; > + > + if (iov_iter_count(&iter) == 0) { > + ret = 0; > + goto out_iov; > + } > + pos = args.offset; > + ret = rw_verify_area(READ, file, &pos, args.len); > + if (ret < 0) > + goto out_iov; > + > + init_sync_kiocb(&kiocb, file); > + ret = kiocb_set_rw_flags(&kiocb, 0); This call is a noop due to: if (!flags) return 0; in kiocb_set_rw_flags. <snip>