On Tue, Oct 29, 2024 at 6:54 AM Yosry Ahmed <yosryahmed@xxxxxxxxxx> wrote: > > On Mon, Oct 28, 2024 at 3:52 PM Barry Song <21cnbao@xxxxxxxxx> wrote: > > > > On Tue, Oct 29, 2024 at 6:33 AM Yosry Ahmed <yosryahmed@xxxxxxxxxx> wrote: > > > > > > [..] > > > > > > By the way, I recently had an idea: if we can conduct the zeromap check > > > > > > earlier - for example - before allocating swap slots and pageout(), could > > > > > > we completely eliminate swap slot occupation and allocation/release > > > > > > for zeromap data? For example, we could use a special swap > > > > > > entry value in the PTE to indicate zero content and directly fill it with > > > > > > zeros when swapping back. We've observed that swap slot allocation and > > > > > > freeing can consume a lot of CPU and slow down functions like > > > > > > zap_pte_range and swap-in. If we can entirely skip these steps, it > > > > > > could improve performance. However, I'm uncertain about the benefits we > > > > > > would gain if we only have 1-2% zeromap data. > > > > > > > > > > If I remember correctly this was one of the ideas floated around in the > > > > > initial version of the zeromap series, but it was evaluated as a lot more > > > > > complicated to do than what the current zeromap code looks like. But I > > > > > think its definitely worth looking into! > > > > > > Yup, I did suggest this on the first version: > > > https://lore.kernel.org/linux-mm/CAJD7tkYcTV_GOZV3qR6uxgFEvYXw1rP-h7WQjDnsdwM=g9cpAw@xxxxxxxxxxxxxx/ > > > > > > , and Usama took a stab at implementing it in the second version: > > > https://lore.kernel.org/linux-mm/20240604105950.1134192-1-usamaarif642@xxxxxxxxx/ > > > > > > David and Shakeel pointed out a few problems. I think they are > > > fixable, but the complexity/benefit tradeoff was getting unclear at > > > that point. > > > > > > If we can make it work without too much complexity, that would be > > > great of course. > > > > > > > > > > > Sorry for the noise. I didn't review the initial discussion. But my feeling > > > > is that it might be valuable considering the report from Zhiguo: > > > > > > > > https://lore.kernel.org/linux-mm/20240805153639.1057-1-justinjiang@xxxxxxxx/ > > > > > > > > In fact, our recent benchmark also indicates that swap free could account > > > > for a significant portion in do_swap_page(). > > > > > > As Shakeel mentioned in a reply to Usama's patch mentioned above, we > > > would need to check the contents of the page after it's unmapped. So > > > likely we need to allocate a swap slot, walk the rmap and unmap, check > > > contents, walk the rmap again and update the PTEs, free the swap slot. > > > > > > > So the issue is that we can't check the content before allocating slots and > > unmapping during reclamation? If we find the content is zero, can we skip > > all slot operations and go directly to rmap/unmap by using a special PTE? > > We need to unmap first before checking the content, otherwise the > content can change right after we check it. Well, do we have a way to terminate the unmap if we find pte_dirty and ensure the folio is still mapped after try_to_unmap_one()? Then we could activate it again after try_to_unmap. It might just be noise. Let me take some more time to think about it. :-)