Re: [Lsf-pc] [LSF/MM ATTEND] Over-the-wire data compression

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Dave,

On 03/18, David Disseldorp wrote:
Hi Enzo,

...
On Thu 14-03-24 15:14:49, Enzo Matsumiya wrote:
> Hello,
>
> Having implemented data compression for SMB2 messages in cifs.ko, I'd
> like to attend LSF/MM to discuss:
>
> - implementation decisions, both in the protocol level and in the
>   compression algorithms; e.g. performance improvements, what could,
>   if possible/wanted, turn into a lib/ module, etc
>
> - compression algorithms in general; talk about algorithms to determine
>   if/how compressible a blob of data is
>     * several such algorithms already exist and are used by on-disk
>       compression tools, but for over-the-wire compression maybe the
>       fastest one with good (not great nor best) predictability
>       could work?

Ideally there could be some overlap between on-disk and over-the-wire
compression algorithm support. That could allow optimally aligned /
sized IOs to avoid unnecessary compression / decompression cycles on an
SMB server / client if the underlying filesystem supports encoded I/O
via e.g. BTRFS_IOC_ENCODED_READ/WRITE.

That's exactly the kind of discussion I'd be interested in when I
mentioned 'modules/subsystems with such overlapping
requirements/desire', and not only from the feature/integration
perspective, but the performance part is something I really wanted to
get right (good) from the beginning.

Which brought me to the 'how to detect uncompressible data' subject;
practical test at hand: when writing this 289MiB ISO file to an SMB
share with compression enabled, only 7 out of 69 WRITE requests
(~10%) are compressed.

(this is not the problem since SMB2 compression is supposed to be
done on a best-effort basis)

So, best effort... for 90% of this particular ISO file, cifs.ko "compressed"
those requests, reached an output with size >= to input size, discarded it
all, and sent the original uncompressed request instead => lots of CPU
cycles wasted.  Would be nice to not try to compress such data right of
the bat, or at least with minimal parsing, instead.

IIUC, we currently have:
SMB: LZ77, LZ77+Huffman (DEFLATE?), LZNT1, LZ4
Btrfs: zlib/DEFLATE, LZO, Zstd
Bcachefs: zlib/DEFLATE, LZ4, Zstd. Currently no encoded I/O support.

The algorithms required by SMB2 looks generic from an initial POV,
but due to some minor, but very important, implementation details,
I couldn't make a Windows Server decompress a DEFLATE'd buffer,
for example.  So I'm not really sure how such integration with other
subsystems would play out.

LZ4 might change this, but I haven't implemented it yet (btw thanks for
pointing me to its support in newest MS-SMB2 :)).


Cheers,

Enzo




[Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux