On 28/10/2024 21:15, Barry Song wrote: > On Tue, Oct 29, 2024 at 4:51 AM Usama Arif <usamaarif642@xxxxxxxxx> wrote: >> >> >> >> On 28/10/2024 20:42, Barry Song wrote: >>> On Tue, Oct 29, 2024 at 4:00 AM Usama Arif <usamaarif642@xxxxxxxxx> wrote: >>>> >>>> >>>> >>>> On 28/10/2024 19:54, Barry Song wrote: >>>>> On Tue, Oct 29, 2024 at 1:20 AM Usama Arif <usamaarif642@xxxxxxxxx> wrote: >>>>>> >>>>>> >>>>>> >>>>>> On 28/10/2024 17:08, Yosry Ahmed wrote: >>>>>>> On Mon, Oct 28, 2024 at 10:00 AM Usama Arif <usamaarif642@xxxxxxxxx> wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On 28/10/2024 16:33, Nhat Pham wrote: >>>>>>>>> On Mon, Oct 28, 2024 at 5:23 AM Usama Arif <usamaarif642@xxxxxxxxx> wrote: >>>>>>>>>> >>>>>>>>>> I wonder if instead of having counters, it might be better to keep track >>>>>>>>>> of the number of zeropages currently stored in zeromap, similar to how >>>>>>>>>> zswap_same_filled_pages did it. It will be more complicated then this >>>>>>>>>> patch, but would give more insight of the current state of the system. >>>>>>>>>> >>>>>>>>>> Joshua (in CC) was going to have a look at that. >>>>>>>>> >>>>>>>>> I don't think one can substitute for the other. >>>>>>>> >>>>>>>> Yes agreed, they have separate uses and provide different information, but >>>>>>>> maybe wasteful to have both types of counters? They are counters so maybe >>>>>>>> dont consume too much resources but I think we should still think about >>>>>>>> it.. >>>>>>> >>>>>>> Not for or against here, but I would say that statement is debatable >>>>>>> at best for memcg stats :) >>>>>>> >>>>>>> Each new counter consumes 2 longs per-memcg per-CPU (see >>>>>>> memcg_vmstats_percpu), about 16 bytes, which is not a lot but it can >>>>>>> quickly add up with a large number of CPUs/memcgs/stats. >>>>>>> >>>>>>> Also, when flushing the stats we iterate all of them to propagate >>>>>>> updates from per-CPU counters. This is already a slowpath so adding >>>>>>> one stat is not a big deal, but again because we iterate all stats on >>>>>>> multiple CPUs (and sometimes on each node as well), the overall flush >>>>>>> latency becomes a concern sometimes. >>>>>>> >>>>>>> All of that is not to say we shouldn't add more memcg stats, but we >>>>>>> have to be mindful of the resources. >>>>>> >>>>>> Yes agreed! Plus the cost of incrementing similar counters (which ofcourse is >>>>>> also not much). >>>>>> >>>>>> Not trying to block this patch in anyway. Just think its a good point >>>>>> to discuss here if we are ok with both types of counters. If its too wasteful >>>>>> then which one we should have. >>>>> >>>>> Hi Usama, >>>>> my point is that with all the below three counters: >>>>> 1. PSWPIN/PSWPOUT >>>>> 2. ZSWPIN/ZSWPOUT >>>>> 3. SWAPIN_SKIP/SWAPOUT_SKIP or (ZEROSWPIN, ZEROSWPOUT what ever) >>>>> >>>>> Shouldn't we have been able to determine the portion of zeromap >>>>> swap indirectly? >>>>> >>>> >>>> Hmm, I might be wrong, but I would have thought no? >>>> >>>> What if you swapout a zero folio, but then discard it? >>>> zeromap_swpout would be incremented, but zeromap_swapin would not. >>> >>> I understand. It looks like we have two issues to tackle: >>> 1. We shouldn't let zeromap swap in or out anything that vanishes into >>> a black hole >>> 2. We want to find out how much I/O/memory has been saved due to zeromap so far >>> >>> From my perspective, issue 1 requires a "fix", while issue 2 is more >>> of an optimization. >> >> Hmm I dont understand why point 1 would be an issue. >> >> If its discarded thats fine as far as I can see. > > it is fine to you and probably me who knows zeromap as well :-) but > any userspace code > as below might be entirely confused: > > p = malloc(1G); > write p to 0; or write part of p to 0 > madv_pageout(p, 1g) > read p to swapin. > > The entire procedure used to involve 1GB of swap out and 1GB of swap in by any > means. Now, it has recorded 0 swaps counted. > > I don't expect userspace is as smart as you :-) > Ah I completely agree, we need to account for it in some metric. I probably misunderstood when you said "We shouldn't let zeromap swap in or out anything that vanishes into a black hole", by we should not have the zeromap optimization for those cases. What I guess you meant is we need to account for it in some metric. >> >> As a reference, memory.stat.zswapped != memory.stat.zswapout - memory.stat.zswapin. >> Because zswapped would take into account swapped out anon memory freed, MADV_FREE, >> shmem truncate, etc as Yosry said about zeromap, But zswapout and zswapin dont. > > I understand. However, I believe what we really need to focus on is > this: if we’ve > swapped out, for instance, 100GB in the past hour, how much of that 100GB is > zero? This information can help us assess the proportion of zero data in the > workload, along with the potential benefits that zeromap can provide for memory, > I/O space, or read/write operations. Additionally, having the second count > can enhance accuracy when considering MADV_DONTNEED, FREE, TRUNCATE, > and so on. > Yes completely agree! I think we can look into adding all three metrics, zeromap_swapped, zeromap_swpout, zeromap_swpin (or whatever name works). >> >> >>> >>> I consider issue 1 to be more critical because, after observing a phone >>> running for some time, I've been able to roughly estimate the portion >>> zeromap can >>> help save using only PSWPOUT, ZSWPOUT, and SWAPOUT_SKIP, even without a >>> SWPIN counter. However, I agree that issue 2 still holds significant value >>> as a separate patch. >>> > > Thanks > Barry