Re: [Bug 25042] New: RAM buffer I/O resource badly interacts with memory hot-add

Petr Vandrovec <petr@xxxxxxxxxxxxxx> · Tue, 4 Jan 2011 15:55:09 -0800

On Tue, Jan 4, 2011 at 2:32 PM, Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> On Tue, Jan 4, 2011 at 1:51 PM, Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote:
>>> Linus's commit 45fbe3ee01b8e463b28c2751b5dcc0cbdc142d90 in May 2009 added code
>>> to create 'RAM buffer' above top of RAM to ensure that I/O resources do not
>>> start immediately after RAM, but sometime later.  Originally it was enforcing
>>> 32MB alignment, now it enforces 64MB.  Which means that in VMs with memory size
>>> which is not multiple of 64MB there will be additional 'RAM buffer' resource
>>> present:
>>>
>>> 100000000-1003fffff : System RAM
>>> 100400000-103ffffff : RAM buffer
>
> I'd suggest just working around it by hotplugging in 64MB chunks.

Unfortunately that does not work - kernels configured for sparsemem
hate adding memory in chunks smaller than section size - regions with
end aligned to 128MB, and at least 128MB large is requirement for
x86-64.  If smaller region is added, then either non-existent memory
is activated, or nothing happens at all, depending on exact values and
kernel versions.  So we align end of the hot-added region to 128MB on
x86-64, and 1GB on ia32.  But we do not align start because there was
no need...

> IOW, the old "it hurts when I do that - don't do that then" solution
> to the problem. There is no reason why a VM should export some random
> 8MB-aligned region that I can see.

It just adds memory where it ended - power-on memory ended at
0x1003ffff, and so it now platform naturally tries to continue where
it left off - from 0x10040000 to 0x10ffffff.  It has no idea that OS
inside has some special requirements, and OS inside unfortunately does
not support _PRS/_SRS on memory devices either, so we cannot offer
possible choices hoping that guest will pick one it likes more than
default placement/size.

> That said, I do repeat: why the hell do you keep digging that hole in
> the first place. Do memory hotplug in 256MB chunks, naturally aligned,
> and don't bother with any of this crazy crap.

So that we can provide contiguous memory area to the VM, and layout of
VM created with some amount of memory is same as VM which was
hot-added to the required size - that's important for supporting
hibernate, and it is easier to implement than discontiguous ranges.

I've modified code so that we hot-add two regions, first to align
memory size to 256MB (that one is not activated successfully if memory
size is not multiple of 64MB, but we cannot do smaller due to
sparsemem restrictions listed above), and add remaining (if more than
256MB is added) from there.  That makes workaround similar to clash
between OPROM base addresses assigned by kernel and ranges reserved in
SRAT for memory hot-add...

Petr

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxxx  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href