On Tue, Dec 04, 2018 at 01:47:17PM -0700, Logan Gunthorpe wrote: > > > On 2018-12-04 1:14 p.m., Andi Kleen wrote: > >> Also, in the same vein, I think it's wrong to have the API enumerate all > >> the different memory available in the system. The API should simply > > > We need an enumeration API too, just to display to the user what they > > have, and possibly for applications to size their buffers > > (all we do with existing NUMA nodes) > > Yes, but I think my main concern is the conflation of the enumeration > API and the binding API. An application doesn't want to walk through all > the possible memory and types in the system just to get some memory that > will work with a couple initiators (which it somehow has to map to > actual resources, like fds). We also don't want userspace to police > itself on which memory works with which initiator. How application would police itself ? The API i am proposing is best effort and as such kernel can fully ignore userspace request as it is doing now sometimes with mbind(). So kernel always have the last call and can always override application decission. Device driver can also decide to override, anything that is kernel side really have more power than userspace would have. So while we give trust to userspace we do not abdicate control. That is not the intention here. > Enumeration is definitely not the common use case. And if we create a > new enumeration API now, it may make it difficult or impossible to unify > these types of memory with the existing NUMA node hierarchies if/when > this gets more integrated with the mm core. The point i am trying to make is that it can not get integrated as regular NUMA node inside the mm core. But rather the mm core can grow to encompass non NUMA node memory. I explained why in other part of this thread but roughly: - Device driver need to be in control of device memory allocation for backward compatibility reasons and to keep full filling thing like graphic API constraint (OpenGL, Vulkan, X, ...). - Adding new node type is problematic inside mm as we are running out of bits in the struct page - Excluding node from the regular allocation path was reject by upstream previously (IBM did post patchset for that IIRC). I feel it is a safer path to avoid a one model fits all here and to accept that device memory will be represented and managed in a different way from other memory. I believe persistent memory folks feels the same on that front. Nonetheless i do want to expose this device memory in a standard way so that we can consolidate and improve user experience on that front. Eventually i hope that more of the device memory management can be turn into a common device memory management inside core mm but i do not want to enforce that at first as it is likely to fail (building a moonbase before you have a moon rocket). I rather grow organicaly from high level API that will get use right away (it is a matter of converting existing user to it s/computeAPIBind/HMSBind). Cheers, Jérôme