Re: [PATCH] btrfs: zlib: do not do unnecessary page copying for compression

Zaslonko Mikhail <zaslonko@xxxxxxxxxxxxx> · Tue, 28 May 2024 12:44:19 +0200

Hello Qu,

On 28.05.2024 00:09, Qu Wenruo wrote:
> 
> 
> 在 2024/5/28 01:55, Zaslonko Mikhail 写道:
>> Hello Qu,
>>
>> I remember implementing btrfs zlib changes related to s390 dfltcc compression support a while ago:
>> https://lwn.net/Articles/808809/
>>
>> The workspace buffer size was indeed enlarged for performance reasons.
>>
>> Please see my comments below.
>>
>> On 27.05.2024 11:24, Qu Wenruo wrote:
>>> [BUG]
>>> In function zlib_compress_folios(), we handle the input by:
>>>
>>> - If there are multiple pages left
>>>    We copy the page content into workspace->buf, and use workspace->buf
>>>    as input for compression.
>>>
>>>    But on x86_64 (which doesn't support dfltcc), that buffer size is just
>>>    one page, so we're wasting our CPU time copying the page for no
>>>    benefit.
>>>
>>> - If there is only one page left
>>>    We use the mapped page address as input for compression.
>>>
>>> The problem is, this means we will copy the whole input range except the
>>> last page (can be as large as 124K), without much obvious benefit.
>>>
>>> Meanwhile the cost is pretty obvious.
>>
>> Actually, the behavior for kernels w/o dfltcc support (currently available on s390
>> only) should not be affected.
>> We copy input pages to the workspace->buf only if the buffer size is larger than 1 page.
>> At least it worked this way after my original btrfs zlib patch:
>> https://lwn.net/ml/linux-kernel/20200108105103.29028-1-zaslonko@xxxxxxxxxxxxx/
>>
>> Has this behavior somehow changed after your page->folio conversion performed for btrfs?
>> https://lore.kernel.org/all/cover.1706521511.git.wqu@xxxxxxxx/
> 
> My bad, I forgot that the buf_size for non-S390 systems is fixed to one
> page thus the page copy is not utilized for x86_64.
> 
> But I'm still wondering if we do not go 4 pages as buffer, how much
> performance penalty would there be?
> 
> One of the objective is to prepare for the incoming sector perfect
> subpage compression support, thus I'm re-checking the existing
> compression code, preparing to change them to be subpage compatible.
> 
> If we can simplify the behavior without too large performance penalty,
> can we consider just using one single page as buffer?

Based on my earlier estimates, bigger buffer provided up to 60% performance for inflate and up to 30% for
deflate on s390 with dfltcc support.
I don't think giving it away for simplification would be a good idea.

> 
> Thanks,
> Qu