Re: [RFC] Per-file compression

Tom Marshall <tom@xxxxxxxxx> · Wed, 29 Apr 2015 16:15:42 -0700

I've done some investigation into stacking compression on top of an 
existing filesystem with the ideas you suggested.  I'm not really much 
of a filesystem guy, so maybe I'm off base.  But here's what I've come 
up with so far:

Provide a function that the underlying filesystem can call to wrap the 
inode.  This allows a new inode to be created and passed back to the VFS 
layer.  Since this won't be a stacking filesystem, I was thinking of 
using new_inode_pseudo for this purpose, as is done in pipe.c.  The 
underlying inode is stored and referenced in inode.i_private.

The compressed inode implements all necessary operations for eg. read, 
write, mmap using generic_* functions where appropriate. This mostly 
leaves inode_operations.getattr and address_space_operations.readpage to 
be implemented.

getattr is implemented by calling the underlying getattr and then 
substituting in the uncompressed file size.

readpage is implemented by finding the compressed offset for the 
requested chunk of data and reading the underlying pages, decompressing 
the chunk, and copying out the desired data.  I'm looking at the 
squashfs implementation for clues as to how this should be done.

Does that sound like a reasonable plan, or am I off base?

On 04/21/2015 08:18 AM, Theodore Ts'o wrote:
On Mon, Apr 20, 2015 at 06:51:03PM +0200, Richard Weinberger wrote:
My thought was that compression is not far away from crypto an hence
a lot of ecryptfs could be reused.
The problem with using eCryptfs as a base is that it assumes that the
encryption is constant-sized --- i.e., that a 4096 plaintext block
encrypts to a 4096 ciphertext block.  This is *not* true for compression.

The other problem with eCryptfs is that since the underlying file
system doesn't know it's being stacked, you end up burning memory for
both the plaintext and ciphertext versions of the file.  This is one
of the reasons why eCryptfs wasn't considered for future versions of
Android; instead we've added encryption into the ext4 file system
layer instead.  (With most of the interesting bits in separate files,
and where I've been communicating with the f2fs maintainer so that
f2fs can add the same encryption feature into f2fs).

For compression, what I'd recommend doing is something similar; do it
at the file system level, but structure it such that it's relatively
easy for other file systems to reuse "library code" for the core data
transforms.  However, allow the underlying file system to use its own
specialized storage for things like flags, xattrs, etc., since it can
be made more efficient.

What I'd also suggest is that you support read-only compression (which
is what MacOS did as well), and do it by using a chunksize of say, 32k
or 64k, and at the very end of the file, store a pointer to the
compressed chunk directory which is simply a header which describes
the chunk size (and other useful bits, such as the compression
algorith, *possibly* a space for a preset compression dictionary that
would be shared across all of the chunks, if that makes sense, and
then a list of offsets into the files which gives the starting offset
for chunk #0, chunk #1, chunk #2, etc.

This file would be created with some help from a userspace
application; said userspace application would do the compression and
write out the compressed file, and then call an ioctl which sets an
attribute which (a) flushes the page cache from containing the
compressed version of the file, and (b) marks the inode as read-only
and containing compressed data.

When the kernel reads from the file, it reads the compression header
and directory, and then pages into the page cache a chunk at a time
--- that is, if userspace requests a single 4k page, the kernel will
read in whatever blocks are needed to decompress the 64k chunk
containing that page, and populate the page cache with that 64k chunk.

I've sketched this design out a few times, hoping to interest someone
into implementing it for ext4, but this is the sort of thing that
could be implemented as a library, and then easily spliced into
mulitple file systems.

Cheers,

					- Ted

P.S.  Note that one of the things about this design is that although
it requires userspace support, it's *perfect* for files which are
installed via a package, whether that be an RPM, dpkg, or apk.  You
just need to create a userspace library which takes the incoming file
stream from the package file, and then writes out the compressed
version of the file and marks the file as containing compressed data.
It shouldn't be hard, once the userspace library is created, to modify
rpm, dpkg, etc., to take advantage of this feature.  And these package
files are the ones which are *perfect* candidates for compression;
they tend to be written once, and read many times, and in general they
are read-only.  (Yes, there are exceptions for config files, but rpm
and dpkg already have a way of specifying which files are config
files, which is important if you want to verify that the unpacked
pacakge is consistent with what was installed originally.)

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html