RE: [PATCH 1/1] Drivers: base: memory: Export symbols for onlining memory blocks

KY Srinivasan <kys@xxxxxxxxxxxxx> · Wed, 24 Jul 2013 19:45:12 +0000

> -----Original Message-----
> From: Dave Hansen [mailto:dave@xxxxxxxx]
> Sent: Wednesday, July 24, 2013 12:43 PM
> To: KY Srinivasan
> Cc: Dave Hansen; Michal Hocko; gregkh@xxxxxxxxxxxxxxxxxxx; linux-
> kernel@xxxxxxxxxxxxxxx; devel@xxxxxxxxxxxxxxxxxxxxxx; olaf@xxxxxxxxx;
> apw@xxxxxxxxxxxxx; andi@xxxxxxxxxxxxxx; akpm@xxxxxxxxxxxxxxxxxxxx; linux-
> mm@xxxxxxxxx; kamezawa.hiroyuki@xxxxxxxxx; hannes@xxxxxxxxxxx;
> yinghan@xxxxxxxxxx; jasowang@xxxxxxxxxx; kay@xxxxxxxx
> Subject: Re: [PATCH 1/1] Drivers: base: memory: Export symbols for onlining
> memory blocks
> 
> On 07/23/2013 10:21 AM, KY Srinivasan wrote:
> >> You have allocated some large, physically contiguous areas of memory
> >> under heavy pressure.  But you also contend that there is too much
> >> memory pressure to run a small userspace helper.  Under heavy memory
> >> pressure, I'd expect large, kernel allocations to fail much more often
> >> than running a small userspace helper.
> >
> > I am only reporting what I am seeing. Broadly, I have two main failure
> conditions to
> > deal with: (a) resource related failure (add_memory() returning -ENOMEM)
> and (b) not being
> > able to online a segment that has been successfully hot-added. I have seen
> both these failures
> > under high memory pressure. By supporting "in context" onlining, we can
> eliminate one failure
> > case. Our inability to online is not a recoverable failure from the host's point of
> view - the memory
> > is committed to the guest (since hot add succeeded) but is not usable since it is
> not onlined.
> 
> Could you please precisely report on what you are seeing in detail?
> Where are the -ENOMEMs coming from?  Which allocation site?  Are you
> seeing OOMs or page allocation failure messages on the console?

The ENOMEM failure I see from the call to hot add memory - the call to
add_memory(). Usually I don't see any OOM messages on the console.

> 
> The operation was split up in to two parts for good reason.  It's
> actually for your _precise_ use case.

I agree and without this split, I could not implement the balloon driver with
hot-add.

> 
> A system under memory pressure is going to have troubles doing a
> hot-add.  You need memory to add memory.  Of the two operations ("add"
> and "online"), "add" is the one vastly more likely to fail.  It has to
> allocate several large swaths of contiguous physical memory.  For that
> reason, the system was designed so that you could "add" and "online"
> separately.  The intention was that you could "add" far in advance and
> then "online" under memory pressure, with the "online" having *VASTLY*
> smaller memory requirements and being much more likely to succeed.
> 
> You're lumping the "allocate several large swaths of contiguous physical
> memory" failures in to the same class as "run a small userspace helper".
>  They are _really_ different problems.  Both prone to allocation
> failures for sure, but _very_ separate problems.  Please don't conflate
> them.

I don't think I am conflating these two issues; I am sorry if I gave that 
impression. All I am saying is that I see two classes of failures: (a) Our
inability to allocate memory to manage the memory that is being hot added
and (b) Our inability to bring the hot added memory online within a reasonable
amount of time. I am not sure the cause for (b) and I was just speculating that
this could be memory related. What is interesting is that I have seen failure related
to our inability to online the memory after having succeeded in hot adding the
memory.

> 
> >> It _sounds_ like you really want to be able to have the host retry the
> >> operation if it fails, and you return success/failure from inside the
> >> kernel.  It's hard for you to tell if running the userspace helper
> >> failed, so your solution is to move what what previously done in
> >> userspace in to the kernel so that you can more easily tell if it failed
> >> or succeeded.
> >>
> >> Is that right?
> >
> > No; I am able to get the proper error code for recoverable failures (hot add
> failures
> > because of lack of memory). By doing what I am proposing here, we can avoid
> one class
> > of failures completely and I think this is what resulted in a better "hot add"
> experience in the
> > guest.
> 
> I think you're taking a huge leap here: "We could not online memory,
> thus we must take userspace out of the loop."
> 
> You might be right.  There might be only one way out of this situation.
>  But you need to provide a little more supporting evidence before we all
> arrive at the same conclusion.

I am not even suggesting that. All I am saying is that there should be a mechanism
for "in context" onlining of memory in addition to the existing sysfs mechanism
for bringing memory online from a kernel context. Hyper-V balloon driver
can certainly use this functionality. I should be sending out the patches for this
shortly.
> 
> BTW, it doesn't _require_ udev.  There could easily be another listener
> for hotplug events.

Agreed; but structurally it is identical to having a udev rule.

Regards,

K. Y

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href