Re: [PATCH 5/8] pstore: Fix long-term implicit conversions in the compression routines

"Guilherme G. Piccoli" <gpiccoli@xxxxxxxxxx> · Sat, 8 Oct 2022 15:12:40 -0300

On 08/10/2022 14:52, Ard Biesheuvel wrote:
> [...]
>> This is exactly what 842 (sw compress) is doing now. If that's
>> interesting and Kees agrees, and if nobody else plans on doing that, I
>> could work on it.
>>
>> Extra question (maybe silly on my side?): is it possible that
>> _compressed_ data is bigger than the original one? Isn't there any
>> "protection" on the compress APIs for that? In that case, it'd purely
>> waste of time / CPU cycles heheh
>>
> 
> No, this is the whole point of those helper routines, as far as I can
> tell. Basically, if you put data that cannot be compressed losslessly
> (e.g., a H264 video) through a lossless compression routine, the
> resulting data will be bigger due to the overhead of the compression
> metadata.
> 
> However, we are compressing ASCII text here, so using the uncompressed
> size as an upper bound for the compressed size is reasonable for any
> compression algorithm. And if dmesg output is not compressible, there
> must be something seriously wrong with it.
> 
> So we could either just drop such input, or simply not bother
> compressing it if doing so would only take up more space. Given the
> low likelihood that we will ever hit this case, I'd say we just ignore
> those.
> 
> Again, please correct me if I am missing something here (Kees?). Are
> there cases where we compress data that may be compressed already?

This is an interesting point of view, thanks for sharing! And it's
possible to kinda test it - I did in the past to test maximum size of
ramoops buffers, but I didn't output the values to compare compressed
vs. uncompressed size (given I didn't need the info at the time).

The trick I used was: suppose I'm using lz4, I polluted dmesg with lz4
already compressed garbage, a huge amount of it, then provoked a crash.
I'll try it again to grab the sizes heheh