[adding a couple folks that directly or indirectly work on the subject] On 10/8/20 8:53 AM, yulei.kernel@xxxxxxxxx wrote: > From: Yulei Zhang <yuleixzhang@xxxxxxxxxxx> > > In current system each physical memory page is assocaited with > a page structure which is used to track the usage of this page. > But due to the memory usage rapidly growing in cloud environment, > we find the resource consuming for page structure storage becomes > highly remarkable. So is it an expense that we could spare? > Happy to see another person working to solve the same problem! I am really glad to see more folks being interested in solving this problem and I hope we can join efforts? BTW, there is also a second benefit in removing struct page - which is carving out memory from the direct map. > This patchset introduces an idea about how to save the extra > memory through a new virtual filesystem -- dmemfs. > > Dmemfs (Direct Memory filesystem) is device memory or reserved > memory based filesystem. This kind of memory is special as it > is not managed by kernel and most important it is without 'struct page'. > Therefore we can leverage the extra memory from the host system > to support more tenants in our cloud service. > This is like a walk down the memory lane. About a year ago we followed the same exact idea/motivation to have memory outside of the direct map (and removing struct page overhead) and started with our own layer/thingie. However we realized that DAX is one the subsystems which already gives you direct access to memory for free (and is already upstream), plus a couple of things which we found more handy. So we sent an RFC a couple months ago: https://lore.kernel.org/linux-mm/20200110190313.17144-1-joao.m.martins@xxxxxxxxxx/ Since then majority of the work has been in improving DAX[1]. But now that is done I am going to follow up with the above patchset. [1] https://lore.kernel.org/linux-mm/159625229779.3040297.11363509688097221416.stgit@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/ (Give me a couple of days and I will send you the link to the latest patches on a git-tree - would love feedback!) The struct page removal for DAX would then be small, and ticks the same bells and whistles (MCE handling, reserving PAT memtypes, ptrace support) that we both do, with a smaller diffstat and it doesn't touch KVM (not at least fundamentally). 15 files changed, 401 insertions(+), 38 deletions(-) The things needed in core-mm is for handling PMD/PUD PAGE_SPECIAL much like we both do. Furthermore there wouldn't be a need for a new vm type, consuming an extra page bit (in addition to PAGE_SPECIAL) or new filesystem. [1] https://lore.kernel.org/linux-mm/159625229779.3040297.11363509688097221416.stgit@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/ > We uses a kernel boot parameter 'dmem=' to reserve the system > memory when the host system boots up, the details can be checked > in /Documentation/admin-guide/kernel-parameters.txt. > > Theoretically for each 4k physical page it can save 64 bytes if > we drop the 'struct page', so for guest memory with 320G it can > save about 5G physical memory totally. > Also worth mentioning that if you only care about 'struct page' cost, and not on the security boundary, there's also some work on hugetlbfs preallocation of hugepages into tricking vmemmap in reusing tail pages. https://lore.kernel.org/linux-mm/20200915125947.26204-1-songmuchun@xxxxxxxxxxxxx/ Going forward that could also make sense for device-dax to avoid so many struct pages allocated (which would require its transition to compound struct pages like hugetlbfs which we are looking at too). In addition an idea <handwaving> would be perhaps to have a stricter mode in DAX where we initialize/use the metadata ('struct page') but remove the underlaying PFNs (of the 'struct page') from the direct map having to bear the cost of mapping/unmapping on gup/pup. Joao