On 05/09/2019 04:10, Dave Chinner wrote: > On Wed, Sep 04, 2019 at 12:13:26PM -0700, Omar Sandoval wrote: >> From: Omar Sandoval <osandov@xxxxxx> >> >> This adds an API for writing compressed data directly to the filesystem. >> The use case that I have in mind is send/receive: currently, when >> sending data from one compressed filesystem to another, the sending side >> decompresses the data and the receiving side recompresses it before >> writing it out. This is wasteful and can be avoided if we can just send >> and write compressed extents. The send part will be implemented in a >> separate series, as this ioctl can stand alone. >> >> The interface is essentially pwrite(2) with some extra information: >> >> - The input buffer contains the compressed data. >> - Both the compressed and decompressed sizes of the data are given. >> - The compression type (zlib, lzo, or zstd) is given. > > So why can't you do this with pwritev2()? Heaps of flags, and > use a second iovec to hold the decompressed size of the previous > iovec. i.e. > > iov[0].iov_base = compressed_data; > iov[0].iov_len = compressed_size; > iov[1].iov_base = NULL; > iov[1].iov_len = uncompressed_size; > pwritev2(fd, iov, 2, offset, RWF_COMPRESSED_ZLIB); > > And you don't need to reinvent pwritev() with some whacky ioctl that > is bound to be completely screwed up is ways not noticed until > someone else tries to use it... > > I'd also suggest atht if we are going to be able to write compressed > data directly, then we should be able to read them as well directly > via preadv2().... While I'm with you on this from a design PoV, one question remains: What to do with the file systems that do not support compression? Currently there's only a kernel global check for known RWF_* flags in kiocb_set_rw_flags(). So we need a way for the individual file systems to opt into the new RWF_COMPRESSED_* flags and fail early if they're not supported, that will cause a lot of code churn if we cannot do it in the vfs layer. >From the 52 ->write_iter callbacks in fs/ 32 are not using generic_file_write_iter(). So we'd have to patch 33 functions (+/- 1-2 because my grep | wc fu isn't the best). Any ideas? Byte, Johannes -- Johannes Thumshirn SUSE Labs Filesystems jthumshirn@xxxxxxx +49 911 74053 689 SUSE Software Solutions Germany GmbH Maxfeldstr. 5 90409 Nürnberg Germany (HRB 247165, AG München) Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850