On Tue, Sep 19, 2023 at 10:59:33AM -0700, Andy Lutomirski wrote: > > I'm not complaining about the name. I'm objecting about the semantics. > > Apparently you have a system to collect usage statistics of physical addresses, but you have no idea what those pages map do (without crawling /proc or /sys, anyway). But that means you have no idea when the logical contents of those pages *changes*. So you fundamentally have a nasty race: anything else that swaps or migrates those pages will mess up your statistics, and you'll start trying to migrate the wrong thing. How does this change if I use virtual address based migration? I could do sampling based on virtual address (page faults, IBS/PEBs, whatever), and by the time I make a decision, the kernel could have migrated the data or even my task from Node A to Node B. The sample I took is now stale, and I could make a poor migration decision. If I do move_pages(pid, some_virt_addr, some_node) and it migrates the page from NodeA to NodeB, then the device-side collection is likewise no longer valid. This problem doesn't change because I used virtual address compared to physical address. But if i have a 512GB memory device, and i can see a wide swath of that 512GB is hot, while a good chunk of my local DRAM is not - then I probably don't care *what* gets migrated up to DRAM, i just care that a vast majority of that hot data does. The goal here isn't 100% precision, you will never get there. The goal here is broad-scope performance enhancements of the overall system while minimizing the cost to compute the migration actions to be taken. I don't think the contents of the page are always relevant. The entire concept here is to enable migration without caring about what programs are using the memory for - just so long as the memcg's and zoning is respected. ~Gregory