On 07.12.20 12:30, yulei.kernel@xxxxxxxxx wrote: > From: Yulei Zhang <yuleixzhang@xxxxxxxxxxx> > > In current system each physical memory page is assocaited with > a page structure which is used to track the usage of this page. > But due to the memory usage rapidly growing in cloud environment, > we find the resource consuming for page structure storage becomes > more and more remarkable. So is it possible that we could reclaim > such memory and make it reusable? > > This patchset introduces an idea about how to save the extra > memory through a new virtual filesystem -- dmemfs. > > Dmemfs (Direct Memory filesystem) is device memory or reserved > memory based filesystem. This kind of memory is special as it > is not managed by kernel and most important it is without 'struct page'. > Therefore we can leverage the extra memory from the host system > to support more tenants in our cloud service. "is not managed by kernel" well, it's obviously is managed by the kernel. It's not managed by the buddy ;) How is this different to using "mem=X" and mapping the relevant memory directly into applications? Is this "simply" a control instance on top that makes sure unprivileged process can access it and not step onto each others feet? Is that the reason why it's called a "file system"? (an example would have helped here, showing how it's used) It's worth noting that memory hotunplug, memory poisoning and probably more is currently fundamentally incompatible with this approach - which should better be pointed out in the cover letter. Also, I think something similar can be obtained by using dax/hmat infrastructure with "memmap=", at least I remember a talk where this was discussed (but not sure if they modified the firmware to expose selected memory as soft-reserved - we would only need a cmdline parameter to achieve the same - Dan might know more). > > As the belowing figure shows, we uses a kernel boot parameter 'dmem=' > to reserve the system memory when the host system boots up, the > remaining system memory is still managed by system memory management > which is associated with "struct page", the reserved memory > will be managed by dmem and assigned to guest system, the details > can be checked in /Documentation/admin-guide/kernel-parameters.txt. > > +------------------+--------------------------------------+ > | system memory | memory for guest system | > +------------------+--------------------------------------+ > | | > v | > struct page | > | | > v v > system mem management dmem > > And during the usage, the dmemfs will handle the memory request to > allocate and free the reserved memory on each NUMA node, the user > space application could leverage the mmap interface to access the > memory, and kernel module such as kvm and vfio would be able to pin > the memory thongh follow_pfn() and get_user_page() in different given > page size granularities. I cannot say that I really like this approach. I really prefer the proposal to free-up most vmemmap pages for huge/gigantic pages instead if all this is about is reducing the memmap size. -- Thanks, David / dhildenb