Re: [RFC V2 00/37] Enhance memory utilization with DMEMFS

David Hildenbrand <david@xxxxxxxxxx> · Mon, 7 Dec 2020 13:02:46 +0100

On 07.12.20 12:30, yulei.kernel@xxxxxxxxx wrote:
> From: Yulei Zhang <yuleixzhang@xxxxxxxxxxx>
> 
> In current system each physical memory page is assocaited with
> a page structure which is used to track the usage of this page.
> But due to the memory usage rapidly growing in cloud environment,
> we find the resource consuming for page structure storage becomes
> more and more remarkable. So is it possible that we could reclaim
> such memory and make it reusable?
> 
> This patchset introduces an idea about how to save the extra
> memory through a new virtual filesystem -- dmemfs.
> 
> Dmemfs (Direct Memory filesystem) is device memory or reserved
> memory based filesystem. This kind of memory is special as it
> is not managed by kernel and most important it is without 'struct page'.
> Therefore we can leverage the extra memory from the host system
> to support more tenants in our cloud service.

"is not managed by kernel" well, it's obviously is managed by the
kernel. It's not managed by the buddy ;)

How is this different to using "mem=X" and mapping the relevant memory
directly into applications? Is this "simply" a control instance on top
that makes sure unprivileged process can access it and not step onto
each others feet? Is that the reason why it's called  a "file system"?
(an example would have helped here, showing how it's used)

It's worth noting that memory hotunplug, memory poisoning and probably
more is currently fundamentally incompatible with this approach - which
should better be pointed out in the cover letter.

Also, I think something similar can be obtained by using dax/hmat
infrastructure with "memmap=", at least I remember a talk where this was
discussed (but not sure if they modified the firmware to expose selected
memory as soft-reserved - we would only need a cmdline parameter to
achieve the same - Dan might know more).

> 
> As the belowing figure shows, we uses a kernel boot parameter 'dmem='
> to reserve the system memory when the host system boots up, the
> remaining system memory is still managed by system memory management
> which is associated with "struct page", the reserved memory
> will be managed by dmem and assigned to guest system, the details
> can be checked in /Documentation/admin-guide/kernel-parameters.txt.
> 
>    +------------------+--------------------------------------+
>    |  system memory   |     memory for guest system          | 
>    +------------------+--------------------------------------+
>     |                                   |
>     v                                   |
> struct page                             |
>     |                                   |
>     v                                   v
>     system mem management             dmem  
> 
> And during the usage, the dmemfs will handle the memory request to
> allocate and free the reserved memory on each NUMA node, the user 
> space application could leverage the mmap interface to access the 
> memory, and kernel module such as kvm and vfio would be able to pin
> the memory thongh follow_pfn() and get_user_page() in different given
> page size granularities.

I cannot say that I really like this approach. I really prefer the
proposal to free-up most vmemmap pages for huge/gigantic pages instead
if all this is about is reducing the memmap size.

-- 
Thanks,

David / dhildenb