On Mon, Dec 7, 2020 at 4:03 AM David Hildenbrand <david@xxxxxxxxxx> wrote: > > On 07.12.20 12:30, yulei.kernel@xxxxxxxxx wrote: > > From: Yulei Zhang <yuleixzhang@xxxxxxxxxxx> > > > > In current system each physical memory page is assocaited with > > a page structure which is used to track the usage of this page. > > But due to the memory usage rapidly growing in cloud environment, > > we find the resource consuming for page structure storage becomes > > more and more remarkable. So is it possible that we could reclaim > > such memory and make it reusable? > > > > This patchset introduces an idea about how to save the extra > > memory through a new virtual filesystem -- dmemfs. > > > > Dmemfs (Direct Memory filesystem) is device memory or reserved > > memory based filesystem. This kind of memory is special as it > > is not managed by kernel and most important it is without 'struct page'. > > Therefore we can leverage the extra memory from the host system > > to support more tenants in our cloud service. > > "is not managed by kernel" well, it's obviously is managed by the > kernel. It's not managed by the buddy ;) > > How is this different to using "mem=X" and mapping the relevant memory > directly into applications? Is this "simply" a control instance on top > that makes sure unprivileged process can access it and not step onto > each others feet? Is that the reason why it's called a "file system"? > (an example would have helped here, showing how it's used) > > It's worth noting that memory hotunplug, memory poisoning and probably > more is currently fundamentally incompatible with this approach - which > should better be pointed out in the cover letter. > > Also, I think something similar can be obtained by using dax/hmat > infrastructure with "memmap=", at least I remember a talk where this was > discussed (but not sure if they modified the firmware to expose selected > memory as soft-reserved - we would only need a cmdline parameter to > achieve the same - Dan might know more). There is currently the efi_fake_mem parameter that can add the "EFI_MEMORY_SP" attribute on EFI platforms: efi_fake_mem=4G@9G:0x40000 ...this results in a /dev/dax instance that can be further partitioned via the device-dax sub-division facility merged for 5.10. That could be generalized to something else for non-EFI platforms, but there has not been a justification to go that route yet. Joao pointed this out in a previous posting of DMEMFS, and I have yet to see an explanation of incremental benefit the kernel gains from having yet another parallel memory management interface.