Hi Dave,
On 03/18, David Disseldorp wrote:
Hi Enzo,
...
On Thu 14-03-24 15:14:49, Enzo Matsumiya wrote:
> Hello,
>
> Having implemented data compression for SMB2 messages in cifs.ko, I'd
> like to attend LSF/MM to discuss:
>
> - implementation decisions, both in the protocol level and in the
> compression algorithms; e.g. performance improvements, what could,
> if possible/wanted, turn into a lib/ module, etc
>
> - compression algorithms in general; talk about algorithms to determine
> if/how compressible a blob of data is
> * several such algorithms already exist and are used by on-disk
> compression tools, but for over-the-wire compression maybe the
> fastest one with good (not great nor best) predictability
> could work?
Ideally there could be some overlap between on-disk and over-the-wire
compression algorithm support. That could allow optimally aligned /
sized IOs to avoid unnecessary compression / decompression cycles on an
SMB server / client if the underlying filesystem supports encoded I/O
via e.g. BTRFS_IOC_ENCODED_READ/WRITE.
That's exactly the kind of discussion I'd be interested in when I
mentioned 'modules/subsystems with such overlapping
requirements/desire', and not only from the feature/integration
perspective, but the performance part is something I really wanted to
get right (good) from the beginning.
Which brought me to the 'how to detect uncompressible data' subject;
practical test at hand: when writing this 289MiB ISO file to an SMB
share with compression enabled, only 7 out of 69 WRITE requests
(~10%) are compressed.
(this is not the problem since SMB2 compression is supposed to be
done on a best-effort basis)
So, best effort... for 90% of this particular ISO file, cifs.ko "compressed"
those requests, reached an output with size >= to input size, discarded it
all, and sent the original uncompressed request instead => lots of CPU
cycles wasted. Would be nice to not try to compress such data right of
the bat, or at least with minimal parsing, instead.
IIUC, we currently have:
SMB: LZ77, LZ77+Huffman (DEFLATE?), LZNT1, LZ4
Btrfs: zlib/DEFLATE, LZO, Zstd
Bcachefs: zlib/DEFLATE, LZ4, Zstd. Currently no encoded I/O support.
The algorithms required by SMB2 looks generic from an initial POV,
but due to some minor, but very important, implementation details,
I couldn't make a Windows Server decompress a DEFLATE'd buffer,
for example. So I'm not really sure how such integration with other
subsystems would play out.
LZ4 might change this, but I haven't implemented it yet (btw thanks for
pointing me to its support in newest MS-SMB2 :)).
Cheers,
Enzo