Re: [PATCH 2/2] btrfs: add ioctl for directly writing compressed data

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 05/09/2019 04:10, Dave Chinner wrote:
> On Wed, Sep 04, 2019 at 12:13:26PM -0700, Omar Sandoval wrote:
>> From: Omar Sandoval <osandov@xxxxxx>
>>
>> This adds an API for writing compressed data directly to the filesystem.
>> The use case that I have in mind is send/receive: currently, when
>> sending data from one compressed filesystem to another, the sending side
>> decompresses the data and the receiving side recompresses it before
>> writing it out. This is wasteful and can be avoided if we can just send
>> and write compressed extents. The send part will be implemented in a
>> separate series, as this ioctl can stand alone.
>>
>> The interface is essentially pwrite(2) with some extra information:
>>
>> - The input buffer contains the compressed data.
>> - Both the compressed and decompressed sizes of the data are given.
>> - The compression type (zlib, lzo, or zstd) is given.
> 
> So why can't you do this with pwritev2()? Heaps of flags, and
> use a second iovec to hold the decompressed size of the previous
> iovec. i.e.
> 
> 	iov[0].iov_base = compressed_data;
> 	iov[0].iov_len = compressed_size;
> 	iov[1].iov_base = NULL;
> 	iov[1].iov_len = uncompressed_size;
> 	pwritev2(fd, iov, 2, offset, RWF_COMPRESSED_ZLIB);
> 
> And you don't need to reinvent pwritev() with some whacky ioctl that
> is bound to be completely screwed up is ways not noticed until
> someone else tries to use it...
> 
> I'd also suggest atht if we are going to be able to write compressed
> data directly, then we should be able to read them as well directly
> via preadv2()....


While I'm with you on this from a design PoV, one question remains:
What to do with the file systems that do not support compression?

Currently there's only a kernel global check for known RWF_* flags in
kiocb_set_rw_flags().

So we need a way for the individual file systems to opt into the new
RWF_COMPRESSED_* flags and fail early if they're not supported, that
will cause a lot of code churn if we cannot do it in the vfs layer.

>From the 52 ->write_iter callbacks in fs/ 32 are not using
generic_file_write_iter(). So we'd have to patch 33 functions (+/- 1-2
because my grep | wc fu isn't the best).

Any ideas?

Byte,
	Johannes
-- 
Johannes Thumshirn                            SUSE Labs Filesystems
jthumshirn@xxxxxxx                                +49 911 74053 689
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5
90409 Nürnberg
Germany
(HRB 247165, AG München)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux