Re: [PATCH RFC 2/2] zram: support compression at the granularity of multi-pages

Usama Arif <usamaarif642@xxxxxxxxx> · Thu, 7 Nov 2024 11:49:22 +0000

On 07/11/2024 10:31, Barry Song wrote:
> On Thu, Nov 7, 2024 at 11:25 PM Barry Song <21cnbao@xxxxxxxxx> wrote:
>>
>> On Thu, Nov 7, 2024 at 5:23 AM Usama Arif <usamaarif642@xxxxxxxxx> wrote:
>>>
>>>
>>>
>>> On 22/10/2024 00:28, Barry Song wrote:
>>>>> From: Tangquan Zheng <zhengtangquan@xxxxxxxx>
>>>>>
>>>>> +static int zram_bvec_write_multi_pages(struct zram *zram, struct bio_vec *bvec,
>>>>> +                       u32 index, int offset, struct bio *bio)
>>>>> +{
>>>>> +    if (is_multi_pages_partial_io(bvec))
>>>>> +            return zram_bvec_write_multi_pages_partial(zram, bvec, index, offset, bio);
>>>>> +    return zram_write_page(zram, bvec->bv_page, index);
>>>>> +}
>>>>> +
>>>
>>> Hi Barry,
>>>
>>> I started reviewing this series just to get a better idea if we can do something
>>> similar for zswap. I haven't looked at zram code before so this might be a basic
>>> question:
>>> How would you end up in zram_bvec_write_multi_pages_partial if using zram for swap?
>>
>> Hi Usama,
>>
>> There’s a corner case where, for instance, a 32KiB mTHP is swapped
>> out. Then, if userspace
>> performs a MADV_DONTNEED on the 0~16KiB portion of this original mTHP,
>> it now consists
>> of 8 swap entries(mTHP has been released and unmapped). With
>> swap0-swap3 released
>> due to DONTNEED, they become available for reallocation, and other
>> folios may be swapped
>> out to those entries. Then it is a combination of the new smaller
>> folios with the original 32KiB
>> mTHP.
> 

Hi Barry,

Thanks for this. So in this example of 32K folio, when swap slots 0-3 are
released zram_slot_free_notify will only clear the ZRAM_COMP_MULTI_PAGES
flag on the 0-3 index and return (without calling zram_free_page on them).

I am assuming that if another folio is now swapped out to those entries,
zram allows to overwrite those pages, eventhough they haven't been freed?

Also, even if its allowed, I still dont think you will end up in
zram_bvec_write_multi_pages_partial when you try to write a 16K or
smaller folio to swap0-3. As want_multi_pages_comp will evaluate to false
as 16K is less than 32K, you will just end up in zram_bio_write_page?

Thanks,
Usama

> Sorry, I forgot to mention that the assumption is ZSMALLOC_MULTI_PAGES_ORDER=3,
> so data is compressed in 32KiB blocks.
> 
> With Chris' and Kairui's new swap optimization, this should be minor,
> as each cluster has
> its own order. However, I recall that order-0 can still steal swap
> slots from other orders'
> clusters when swap space is limited by scanning all slots? Please
> correct me if I'm
> wrong, Kairui and Chris.
> 
>>
>>>
>>> We only swapout whole folios. If ZCOMP_MULTI_PAGES_SIZE=64K, any folio smaller
>>> than 64K will end up in zram_bio_write_page. Folios greater than or equal to 64K
>>> would be dispatched by zram_bio_write_multi_pages to zram_bvec_write_multi_pages
>>> in 64K chunks. So for e.g. 128K folio would end up calling zram_bvec_write_multi_pages
>>> twice.
>>
>> In v2, I changed the default order to 2, allowing all anonymous mTHP
>> to benefit from this
>> feature.
>>
>>>
>>> Or is this for the case when you are using zram not for swap? In that case, I probably
>>> dont need to consider zram_bvec_write_multi_pages_partial write case for zswap.
>>>
>>> Thanks,
>>> Usama
>>
> 
> Thanks
> barry