On 29.03.19 09:45, Oscar Salvador wrote: > On Thu, Mar 28, 2019 at 04:31:44PM +0100, David Hildenbrand wrote: >> Correct me if I am wrong. I think I was confused - vmemmap data is still >> allocated *per memory block*, not for the whole added memory, correct? > > No, vmemap data is allocated per memory-resource added. > In case a DIMM, would be a DIMM, in case a qemu memory-device, would be that > memory-device. > That is counting that ACPI does not split the DIMM/memory-device in several memory > resources. > If that happens, then acpi_memory_enable_device() calls __add_memory for every > memory-resource, which means that the vmemmap data will be allocated per > memory-resource. > I did not see this happening though, and I am not sure under which circumstances > can happen (I have to study the ACPI code a bit more). > > The problem with allocating vmemmap data per memblock, is the fragmentation. > Let us say you do the following: > > * memblock granularity 128M > > (qemu) object_add memory-backend-ram,id=ram0,size=256M > (qemu) device_add pc-dimm,id=dimm0,memdev=ram0,node=1 > > This will create two memblocks (2 sections), and if we allocate the vmemmap > data for each corresponding section within it section(memblock), you only get > 126M contiguous memory. Oh okay, so actually the way I guessed it would be now. While this makes totally sense, I'll have to look how it is currently handled, meaning if there is a change. I somewhat remembering that delayed struct pages initialization would initialize vmmap per section, not per memory resource. But as I work on 10 things differently, my mind sometimes seems to forget stuff in order to replace it with random nonsense. Will look into the details to not have to ask too many dumb questions. > > So, the taken approach is to allocate the vmemmap data corresponging to the > whole DIMM/memory-device/memory-resource from the beginning of its memory. > > In the example from above, the vmemmap data for both sections is allocated from > the beginning of the first section: > > memmap array takes 2MB per section, so 512 pfns. > If we add 2 sections: > > [ pfn#0 ] \ > [ ... ] | vmemmap used for memmap array > [pfn#1023 ] / > > [pfn#1024 ] \ > [ ... ] | used as normal memory > [pfn#65536] / > > So, out of 256M, we get 252M to use as a real memory, as 4M will be used for > building the memmap array. > > Actually, it can happen that depending on how big a DIMM/memory-device is, > the first/s memblock is fully used for the memmap array (of course, this > can only be seen when adding a huge DIMM/memory-device). > Just stating here, that with your code, add_memory() and remove_memory() always have to be called in the same granularity. Will have to see if that implies a change. -- Thanks, David / dhildenb