Re: fs compression

"Theodore Ts'o" <tytso@xxxxxxx> · Wed, 20 May 2015 17:36:41 -0400

On Wed, May 20, 2015 at 10:46:35AM -0700, Tom Marshall wrote:
> So I've been playing around a bit and I have a basic strategy laid out.
> Please let me know if I'm on the right track.
> 
> Compressed file attributes
> ==========================
> 
> The filesystem is responsible for detecting whether a file is compressed and
> hooking into the compression lib.  This may be done with an inode flag,
> xattr, or any other applicable method.  No other special attributes are
> necessary.

So I assume what you are implementing is read-only compression; that
is, once the file is written, and the attribute set indicating that
this is a compressed file, it is now immutable.

> Compressed file format
> ======================
> 
> Compressed files shall have header, block map, and data sections.
> 
> Header:
> 
> byte[4]		magic		'zzzz' (not strictly needed)
> byte		param1		method and flags
> 	bits 0..3 = compression method (1=zlib, 2=lz4, etc.)
> 	bits 4..7 = flags (none defined yet)
> byte		blocksize	log2 of blocksize (max 31)

I suggest using the term "compression cluster" to distinguish this
from the file system block size.

> le48		orig_size	original uncompressed file size
> 
> 
> Block map:
> 
> Vector of le16 (if blocksize <= 16) or le32 (if blocksize > 16).  Each entry
> is the compressed size of the block.  Zero indicates that the block is
> stored uncompressed, in case compression expanded the block.

What I would store instead is list of 32 or 64-bit offsets, where the
nth entry in the array indicates the starting offset of the nth
compression cluster.

> Questions and issues
 ====================
> 
> Should there be any padding for the data blocks?  For example, if writing is
> to be supported, padding the compressed data to the filesystem block size
> would allow for easy rewriting of individual blocks without disturbing the
> surrounding blocks.  Perhaps padding could be indicated by a flag.

If you add padding then you defeat the whole point of adding
compression.  What if the initial contents of a 64k cluster was all
zeros, so it trivially compresses down to a few dozen bytes; but then
it gets replaced by completely uncompressible data?  If you add 64k
worth of padding to each block, then you're not saving any space, so
what's the point?

> The compression code must be able to read pages from the underlying
> filesystem.  This involves using the pagecache.  But the uncompressed data
> is what ultimately should end up in the pagecache.  This is where I'm
> currently stuck.  How do I implement the code such that the underlying
> compressed data may be read (using the pagecache or not) while not
> disturbing the pagecache for the uncompressed data?  I'm wondering if I need
> to create an internal address_space to pass down into the underlying
> readpage?  Or is there another way to do this?

So I would *not* reference the compressed data via the page cache.  If
you do that, then you end up wasting space in the page cache, since
the page cache will contain both the compressed and decompressed data
--- and once the data has been decompressed, the compressed version is
completely useless.  So it's better to have the file system supply the
physical location on disk, and then to read in the compressed data to
a scratched set of page which is freed immediately after you are done
decompressing things.

This is why compression is so very different from encryption.  The
constraints make it quite different.

Regards,

						- Ted

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html