Re: Readahead for compressed data

Gao Xiang <hsiangkao@xxxxxxxxxxxxxxxxx> · Fri, 22 Oct 2021 10:31:03 +0800

On Fri, Oct 22, 2021 at 03:09:03AM +0100, Phillip Lougher wrote:
> On 22/10/2021 02:04, Phillip Susi wrote:
> > 
> > Matthew Wilcox <willy@xxxxxxxxxxxxx> writes:
> > 
> > > As far as I can tell, the following filesystems support compressed data:
> > > 
> > > bcachefs, btrfs, erofs, ntfs, squashfs, zisofs
> > > 
> > > I'd like to make it easier and more efficient for filesystems to
> > > implement compressed data.  There are a lot of approaches in use today,
> > > but none of them seem quite right to me.  I'm going to lay out a few
> > > design considerations next and then propose a solution.  Feel free to
> > > tell me I've got the constraints wrong, or suggest alternative solutions.
> > > 
> > > When we call ->readahead from the VFS, the VFS has decided which pages
> > > are going to be the most useful to bring in, but it doesn't know how
> > > pages are bundled together into blocks.  As I've learned from talking to
> > > Gao Xiang, sometimes the filesystem doesn't know either, so this isn't
> > > something we can teach the VFS.
> > > 
> > > We (David) added readahead_expand() recently to let the filesystem
> > > opportunistically add pages to the page cache "around" the area requested
> > > by the VFS.  That reduces the number of times the filesystem has to
> > > decompress the same block.  But it can fail (due to memory allocation
> > > failures or pages already being present in the cache).  So filesystems
> > > still have to implement some kind of fallback.
> > 
> > Wouldn't it be better to keep the *compressed* data in the cache and
> > decompress it multiple times if needed rather than decompress it once
> > and cache the decompressed data?  You would use more CPU time
> > decompressing multiple times
> 
> There is a fairly obvious problem here.   A malicious attacker could
> "trick" the filesystem into endlessly decompressing the same blocks,
> which because the compressed data is cached, could cause it to use
> all CPU, and cause a DOS attack.  Caching the uncompressed data prevents
> these decompression attacks from occurring so easily.

Yes, that is another good point. But with artifact memory pressure,
there is no difference to cache compressed data or decompressed data,
otherwise these pages will be unreclaimable, reclaim any of cached
decompressed data will also need decompress the whole pcluster.

The same to zram or zswap or what else software compression solution,
what we can do is to limit the CPU utilization if any such requirement.
But that is quite hard for lz4 since in-memory lz4 decompression speed
is usually fast than the backend storage.

By the way, as far as I know there were some experimental hardware
accelerator integrated to storage devices. So they don't need to expand
decompress pages as well. And do inplace I/O only for such devices.

Thanks,
Gao Xiang

> 
> Phillip
> 
>