On 04/19/2017 12:52 AM, Balbir Singh wrote:
This is a request for comments on the discussed approaches
for coherent memory at mm-summit (some of the details are at
https://lwn.net/Articles/717601/). The latest posted patch
series is at https://lwn.net/Articles/713035/. I am reposting
this as RFC, Michal Hocko suggested using HMM for CDM, but
we believe there are stronger reasons to use the NUMA approach.
The earlier patches for Coherent Device memory were implemented
and designed by Anshuman Khandual.
Hi Balbir,
Although I think everyone agrees that in the [very] long term, these
hardware-coherent nodes probably want to be NUMA nodes, in order to decide what to
code up over the next few years, we need to get a clear idea of what has to be done
for each possible approach.
Here, the CDM discussion is falling just a bit short, because it does not yet
include the whole story of what we would need to do. Earlier threads pointed this
out: the idea started as a large patchset RFC, but then, "for ease of review", it
got turned into a smaller RFC, which loses too much context.
So, I'd suggest putting together something more complete, so that it can be fairly
compared against the HMM-for-hardware-coherent-nodes approach.
Jerome posted HMM-CDM at https://lwn.net/Articles/713035/.
The patches do a great deal to enable CDM with HMM, but we
still believe that HMM with CDM is not a natural way to
represent coherent device memory and the mm will need
to be audited and enhanced for it to even work.
That is also true for the CDM approach. Specifically, in order for this to be of any
use to device drivers, we'll need the following:
1. A way to move pages between NUMA nodes, both virtual address and physical
address-based, from kernel mode.
2. A way to provide reverse mapping information to device drivers, even if
indirectly. (I'm not proposing exposing rmap, but this has to be thought through,
because at some point, a device will need to do something with a physical page.)
This strikes me as the hardest part of the problem.
3. Detection and mitigation of page thrashing between NUMA nodes (shared
responsibility between core -mm and device driver, but probably missing some APIs
today).
4. Handling of oversubscription (allocating more memory than is physically on a NUMA
node, by evicting "LRU-like" pages, rather than the current fallback to other NUMA
nodes). Similar to (3) with respect to where we're at today.
5. Something to handle the story of bringing NUMA nodes online and putting them back
offline, given that they require a device driver that may not yet have been loaded.
There are a few minor missing bits there.
thanks,
--
John Hubbard
NVIDIA
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>