Re: [RFC 0/4] RFC - Coherent Device Memory (Not for inclusion)

John Hubbard <jhubbard@xxxxxxxxxx> · Mon, 1 May 2017 13:41:55 -0700

On 04/19/2017 12:52 AM, Balbir Singh wrote:
This is a request for comments on the discussed approaches
for coherent memory at mm-summit (some of the details are at
https://lwn.net/Articles/717601/). The latest posted patch
series is at https://lwn.net/Articles/713035/. I am reposting
this as RFC, Michal Hocko suggested using HMM for CDM, but
we believe there are stronger reasons to use the NUMA approach.
The earlier patches for Coherent Device memory were implemented
and designed by Anshuman Khandual.

Hi Balbir,

Although I think everyone agrees that in the [very] long term, these 
hardware-coherent nodes probably want to be NUMA nodes, in order to decide what to 
code up over the next few years, we need to get a clear idea of what has to be done 
for each possible approach.

Here, the CDM discussion is falling just a bit short, because it does not yet 
include the whole story of what we would need to do. Earlier threads pointed this 
out: the idea started as a large patchset RFC, but then, "for ease of review", it 
got turned into a smaller RFC, which loses too much context.

So, I'd suggest putting together something more complete, so that it can be fairly 
compared against the HMM-for-hardware-coherent-nodes approach.

Jerome posted HMM-CDM at https://lwn.net/Articles/713035/.
The patches do a great deal to enable CDM with HMM, but we
still believe that HMM with CDM is not a natural way to
represent coherent device memory and the mm will need
to be audited and enhanced for it to even work.

That is also true for the CDM approach. Specifically, in order for this to be of any 
use to device drivers, we'll need the following:

1. A way to move pages between NUMA nodes, both virtual address and physical 
address-based, from kernel mode.

2. A way to provide reverse mapping information to device drivers, even if 
indirectly. (I'm not proposing exposing rmap, but this has to be thought through, 
because at some point, a device will need to do something with a physical page.)

This strikes me as the hardest part of the problem.

3. Detection and mitigation of page thrashing between NUMA nodes (shared 
responsibility between core -mm and device driver, but probably missing some APIs 
today).

4. Handling of oversubscription (allocating more memory than is physically on a NUMA 
node, by evicting "LRU-like" pages, rather than the current fallback to other NUMA 
nodes). Similar to (3) with respect to where we're at today.

5. Something to handle the story of bringing NUMA nodes online and putting them back 
offline, given that they require a device driver that may not yet have been loaded. 
There are a few minor missing bits there.

thanks,

--
John Hubbard
NVIDIA

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>