On Tue, Oct 29, 2024 at 4:51 AM Usama Arif <usamaarif642@xxxxxxxxx> wrote: > > > > On 28/10/2024 20:42, Barry Song wrote: > > On Tue, Oct 29, 2024 at 4:00 AM Usama Arif <usamaarif642@xxxxxxxxx> wrote: > >> > >> > >> > >> On 28/10/2024 19:54, Barry Song wrote: > >>> On Tue, Oct 29, 2024 at 1:20 AM Usama Arif <usamaarif642@xxxxxxxxx> wrote: > >>>> > >>>> > >>>> > >>>> On 28/10/2024 17:08, Yosry Ahmed wrote: > >>>>> On Mon, Oct 28, 2024 at 10:00 AM Usama Arif <usamaarif642@xxxxxxxxx> wrote: > >>>>>> > >>>>>> > >>>>>> > >>>>>> On 28/10/2024 16:33, Nhat Pham wrote: > >>>>>>> On Mon, Oct 28, 2024 at 5:23 AM Usama Arif <usamaarif642@xxxxxxxxx> wrote: > >>>>>>>> > >>>>>>>> I wonder if instead of having counters, it might be better to keep track > >>>>>>>> of the number of zeropages currently stored in zeromap, similar to how > >>>>>>>> zswap_same_filled_pages did it. It will be more complicated then this > >>>>>>>> patch, but would give more insight of the current state of the system. > >>>>>>>> > >>>>>>>> Joshua (in CC) was going to have a look at that. > >>>>>>> > >>>>>>> I don't think one can substitute for the other. > >>>>>> > >>>>>> Yes agreed, they have separate uses and provide different information, but > >>>>>> maybe wasteful to have both types of counters? They are counters so maybe > >>>>>> dont consume too much resources but I think we should still think about > >>>>>> it.. > >>>>> > >>>>> Not for or against here, but I would say that statement is debatable > >>>>> at best for memcg stats :) > >>>>> > >>>>> Each new counter consumes 2 longs per-memcg per-CPU (see > >>>>> memcg_vmstats_percpu), about 16 bytes, which is not a lot but it can > >>>>> quickly add up with a large number of CPUs/memcgs/stats. > >>>>> > >>>>> Also, when flushing the stats we iterate all of them to propagate > >>>>> updates from per-CPU counters. This is already a slowpath so adding > >>>>> one stat is not a big deal, but again because we iterate all stats on > >>>>> multiple CPUs (and sometimes on each node as well), the overall flush > >>>>> latency becomes a concern sometimes. > >>>>> > >>>>> All of that is not to say we shouldn't add more memcg stats, but we > >>>>> have to be mindful of the resources. > >>>> > >>>> Yes agreed! Plus the cost of incrementing similar counters (which ofcourse is > >>>> also not much). > >>>> > >>>> Not trying to block this patch in anyway. Just think its a good point > >>>> to discuss here if we are ok with both types of counters. If its too wasteful > >>>> then which one we should have. > >>> > >>> Hi Usama, > >>> my point is that with all the below three counters: > >>> 1. PSWPIN/PSWPOUT > >>> 2. ZSWPIN/ZSWPOUT > >>> 3. SWAPIN_SKIP/SWAPOUT_SKIP or (ZEROSWPIN, ZEROSWPOUT what ever) > >>> > >>> Shouldn't we have been able to determine the portion of zeromap > >>> swap indirectly? > >>> > >> > >> Hmm, I might be wrong, but I would have thought no? > >> > >> What if you swapout a zero folio, but then discard it? > >> zeromap_swpout would be incremented, but zeromap_swapin would not. > > > > I understand. It looks like we have two issues to tackle: > > 1. We shouldn't let zeromap swap in or out anything that vanishes into > > a black hole > > 2. We want to find out how much I/O/memory has been saved due to zeromap so far > > > > From my perspective, issue 1 requires a "fix", while issue 2 is more > > of an optimization. > > Hmm I dont understand why point 1 would be an issue. > > If its discarded thats fine as far as I can see. it is fine to you and probably me who knows zeromap as well :-) but any userspace code as below might be entirely confused: p = malloc(1G); write p to 0; or write part of p to 0 madv_pageout(p, 1g) read p to swapin. The entire procedure used to involve 1GB of swap out and 1GB of swap in by any means. Now, it has recorded 0 swaps counted. I don't expect userspace is as smart as you :-) > > As a reference, memory.stat.zswapped != memory.stat.zswapout - memory.stat.zswapin. > Because zswapped would take into account swapped out anon memory freed, MADV_FREE, > shmem truncate, etc as Yosry said about zeromap, But zswapout and zswapin dont. I understand. However, I believe what we really need to focus on is this: if we’ve swapped out, for instance, 100GB in the past hour, how much of that 100GB is zero? This information can help us assess the proportion of zero data in the workload, along with the potential benefits that zeromap can provide for memory, I/O space, or read/write operations. Additionally, having the second count can enhance accuracy when considering MADV_DONTNEED, FREE, TRUNCATE, and so on. > > > > > > I consider issue 1 to be more critical because, after observing a phone > > running for some time, I've been able to roughly estimate the portion > > zeromap can > > help save using only PSWPOUT, ZSWPOUT, and SWAPOUT_SKIP, even without a > > SWPIN counter. However, I agree that issue 2 still holds significant value > > as a separate patch. > > Thanks Barry