On Fri, Oct 15, 2021 at 02:45:46PM +0300, Nikolay Borisov wrote: > > > On 1.09.21 г. 20:01, Omar Sandoval wrote: > > From: Omar Sandoval <osandov@xxxxxx> > > > > There are 4 main cases: > > > > 1. Inline extents: we copy the data straight out of the extent buffer. > > 2. Hole/preallocated extents: we fill in zeroes. > > 3. Regular, uncompressed extents: we read the sectors we need directly > > from disk. > > 4. Regular, compressed extents: we read the entire compressed extent > > from disk and indicate what subset of the decompressed extent is in > > the file. > > > > This initial implementation simplifies a few things that can be improved > > in the future: > > > > - We hold the inode lock during the operation. > > - Cases 1, 3, and 4 allocate temporary memory to read into before > > copying out to userspace. > > - We don't do read repair, because it turns out that read repair is > > currently broken for compressed data. > > > > Signed-off-by: Omar Sandoval <osandov@xxxxxx> > > --- > > fs/btrfs/ctree.h | 4 + > > fs/btrfs/inode.c | 489 +++++++++++++++++++++++++++++++++++++++++++++++ > > fs/btrfs/ioctl.c | 111 +++++++++++ > > 3 files changed, 604 insertions(+) > > > > diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h > > index b95ec5fb68d5..cbd7e07c1c34 100644 > > --- a/fs/btrfs/ctree.h > > +++ b/fs/btrfs/ctree.h > > @@ -3223,6 +3223,10 @@ int btrfs_writepage_cow_fixup(struct page *page); > > void btrfs_writepage_endio_finish_ordered(struct btrfs_inode *inode, > > struct page *page, u64 start, > > u64 end, bool uptodate); > > +struct btrfs_ioctl_encoded_io_args; > > +ssize_t btrfs_encoded_read(struct kiocb *iocb, struct iov_iter *iter, > > + struct btrfs_ioctl_encoded_io_args *encoded); > > + > > extern const struct dentry_operations btrfs_dentry_operations; > > extern const struct iomap_ops btrfs_dio_iomap_ops; > > extern const struct iomap_dio_ops btrfs_dio_ops; > > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c > > index a87a34f56234..1940f22179ba 100644 > > --- a/fs/btrfs/inode.c > > +++ b/fs/btrfs/inode.c > > @@ -10500,6 +10500,495 @@ void btrfs_set_range_writeback(struct btrfs_inode *inode, u64 start, u64 end) > > } > > } > > > > <snip> > > > + > > +static blk_status_t btrfs_encoded_read_check_bio(struct btrfs_io_bio *io_bio) > > nit: The gist of this function is to check the csum so how about > renaming it to btrfs_encoded_read_verify_csum Good point, will do. > > + > > +static void btrfs_encoded_read_endio(struct bio *bio) > > +{ > > + struct btrfs_encoded_read_private *priv = bio->bi_private; > > + struct btrfs_io_bio *io_bio = btrfs_io_bio(bio); > > + blk_status_t status; > > + > > + status = btrfs_encoded_read_check_bio(io_bio); > > + if (status) { > > + /* > > + * The memory barrier implied by the atomic_dec_return() here > > + * pairs with the memory barrier implied by the > > + * atomic_dec_return() or io_wait_event() in > > nit: I think atomic_dec_return in read_regular_fill_pages is > inconsequential, what we want to ensure is that when the caller of > io_wait_event is woken up by this thread it will observe the > priv->status, which it will, because the atomic-dec_return in this > function has paired with the general barrier interpolated by wait_event. > > So for brevity just leave the text to say "by io_wait_event". Considering that there is a code path in btrfs_encoded_read_regular_fill_pages() where atomic_dec_return() is called but io_wait_event() isn't, I think it's important to mention both. (This path should be fairly rare, since it can only happen if all of the bios are completed before the submitting thread checks the pending counter. But, it's a possible code path, and I wouldn't want someone reading the comment to be confused and think that we're missing a barrier in that case.) > > + * btrfs_encoded_read_regular_fill_pages() to ensure that this > > + * write is observed before the load of status in > > + * btrfs_encoded_read_regular_fill_pages(). > > + */ > > + WRITE_ONCE(priv->status, status); > > + } > > + if (!atomic_dec_return(&priv->pending)) > > + wake_up(&priv->wait); > > + btrfs_io_bio_free_csum(io_bio); > > + bio_put(bio); > > +} > > <snip> > > > @@ -4824,6 +4841,94 @@ static int _btrfs_ioctl_send(struct file *file, void __user *argp, bool compat) > > return ret; > > } > > > > <snip> > > > + memset((char *)&args + copy_end_kernel, 0, > > + sizeof(args) - copy_end_kernel); > > nit: This memset can be eliminated ( in source) by marking args = {}; > and just leaving copy from user above. args = {} results in slightly worse generated code, but that doesn't really matter here, so I'll change it. > > + > > + ret = import_iovec(READ, args.iov, args.iovcnt, ARRAY_SIZE(iovstack), > > + &iov, &iter); > > + if (ret < 0) > > + goto out_acct; > > + > > + if (iov_iter_count(&iter) == 0) { > > + ret = 0; > > + goto out_iov; > > + } > > + pos = args.offset; > > + ret = rw_verify_area(READ, file, &pos, args.len); > > + if (ret < 0) > > + goto out_iov; > > + > > + init_sync_kiocb(&kiocb, file); > > + ret = kiocb_set_rw_flags(&kiocb, 0); > > This call is a noop due to: > if (!flags) > return 0; > > in kiocb_set_rw_flags. Good catch, I'll remove that call.