On 07/11/2024 10:31, Barry Song wrote: > On Thu, Nov 7, 2024 at 11:25 PM Barry Song <21cnbao@xxxxxxxxx> wrote: >> >> On Thu, Nov 7, 2024 at 5:23 AM Usama Arif <usamaarif642@xxxxxxxxx> wrote: >>> >>> >>> >>> On 22/10/2024 00:28, Barry Song wrote: >>>>> From: Tangquan Zheng <zhengtangquan@xxxxxxxx> >>>>> >>>>> +static int zram_bvec_write_multi_pages(struct zram *zram, struct bio_vec *bvec, >>>>> + u32 index, int offset, struct bio *bio) >>>>> +{ >>>>> + if (is_multi_pages_partial_io(bvec)) >>>>> + return zram_bvec_write_multi_pages_partial(zram, bvec, index, offset, bio); >>>>> + return zram_write_page(zram, bvec->bv_page, index); >>>>> +} >>>>> + >>> >>> Hi Barry, >>> >>> I started reviewing this series just to get a better idea if we can do something >>> similar for zswap. I haven't looked at zram code before so this might be a basic >>> question: >>> How would you end up in zram_bvec_write_multi_pages_partial if using zram for swap? >> >> Hi Usama, >> >> There’s a corner case where, for instance, a 32KiB mTHP is swapped >> out. Then, if userspace >> performs a MADV_DONTNEED on the 0~16KiB portion of this original mTHP, >> it now consists >> of 8 swap entries(mTHP has been released and unmapped). With >> swap0-swap3 released >> due to DONTNEED, they become available for reallocation, and other >> folios may be swapped >> out to those entries. Then it is a combination of the new smaller >> folios with the original 32KiB >> mTHP. > Hi Barry, Thanks for this. So in this example of 32K folio, when swap slots 0-3 are released zram_slot_free_notify will only clear the ZRAM_COMP_MULTI_PAGES flag on the 0-3 index and return (without calling zram_free_page on them). I am assuming that if another folio is now swapped out to those entries, zram allows to overwrite those pages, eventhough they haven't been freed? Also, even if its allowed, I still dont think you will end up in zram_bvec_write_multi_pages_partial when you try to write a 16K or smaller folio to swap0-3. As want_multi_pages_comp will evaluate to false as 16K is less than 32K, you will just end up in zram_bio_write_page? Thanks, Usama > Sorry, I forgot to mention that the assumption is ZSMALLOC_MULTI_PAGES_ORDER=3, > so data is compressed in 32KiB blocks. > > With Chris' and Kairui's new swap optimization, this should be minor, > as each cluster has > its own order. However, I recall that order-0 can still steal swap > slots from other orders' > clusters when swap space is limited by scanning all slots? Please > correct me if I'm > wrong, Kairui and Chris. > >> >>> >>> We only swapout whole folios. If ZCOMP_MULTI_PAGES_SIZE=64K, any folio smaller >>> than 64K will end up in zram_bio_write_page. Folios greater than or equal to 64K >>> would be dispatched by zram_bio_write_multi_pages to zram_bvec_write_multi_pages >>> in 64K chunks. So for e.g. 128K folio would end up calling zram_bvec_write_multi_pages >>> twice. >> >> In v2, I changed the default order to 2, allowing all anonymous mTHP >> to benefit from this >> feature. >> >>> >>> Or is this for the case when you are using zram not for swap? In that case, I probably >>> dont need to consider zram_bvec_write_multi_pages_partial write case for zswap. >>> >>> Thanks, >>> Usama >> > > Thanks > barry