Writing this from my phone so apologies in advance if it messes up formatting somewhere. On Mon, Mar 17, 2025, at 9:22 PM, Jason Gunthorpe wrote: > On Sun, Mar 16, 2025 at 08:52:43PM -0700, David Rientjes wrote: [...] >> Pratyush noted there was no way to preserve folio orders in KHO and he >> also noted there was a need for page flags. > > I think the xarray idea will preserve folio orders, that was a big > point of it. > > Not clear why we'd need to preserve page flags. The same page flags > may not even exist in the new kernel? New kernel should set the page > flags correctly based on what it is doing. Shouldn't, say, memfd know > exactly what it's page flags should be in the new kernel when adopting > the memory? I didn't mean the exact flags value, but the ability to have per-folio flags. The exact bits and their meaning would of course need to be part of the ABI. Shmem uses the dirty and uptodate flags to track some state on the folios, and the flags can affect it's behavior (lazily zeroing out falloc-ed pages for example). I am assuming other FD types or drivers might also want to store per-folio information. Having KHO core provide this facility can avoid duplicating the logic in each subsystem. That said, I don't think this is a blocking feature that should be present from the get go. I would be happy if it is, since that would make the shmem flag tracking easy, but for now I can have a separate property to track this. > >> Pasha asked how cgroups would be handled, but there was no current >> support for that. Pratyush said the current RFC focused on anon memfd >> and has not yet looked at hugetlb. Pasha emphasized the importance of >> focusing on one type of memory to start. > > I'd say userspace should deal with this. It should de-serialize the FD > within the context of the cgroup it wants to charge that FD too, and > the de-serializing process should charge that cgroups accounting with > whatever is restored inside the FD. > > Is that possible? For FDBox, it is certainly possible. In the current patch version, deserialization happens on boot so it can't be done, but in later versions I want to give userspace control on when to deserialize. So whichever context triggers that gets charged. [...] -- Regards, Pratyush Yadav