On Wed, Feb 22, 2017 at 10:29:21AM +0100, Michal Hocko wrote: > On Tue 21-02-17 18:39:17, Anshuman Khandual wrote: > > On 02/17/2017 07:02 PM, Mel Gorman wrote: [...] > [...] > > These are the reasons which prohibit the use of HMM for coherent > > addressable device memory purpose. > > > [...] > > (3) Application cannot directly allocate into device memory from user > > space using existing memory related system calls like mmap() and mbind() > > as the device memory hides away in ZONE_DEVICE. > > Why cannot the application simply use mmap on the device file? This has been said before but we want to share the address space this do imply that you can not rely on special allocator. For instance you can have an application that use a library and the library use the GPU but the application is un-aware and those any data provided by the application to the library will come from generic malloc (mmap anonymous or from regular file). Currently what happens is that the library reallocate memory through special allocator and copy thing. Not only does this waste memory (the new memory is often regular memory too) but you also have to paid the cost of copying GB of data. Last bullet to this, is complex data structure (list, tree, ...) having to go through special allocator means you have re-build the whole structure with the duplicated memory. Allowing to directly use memory allocated from malloc (mmap anonymous private or from a regular file) avoid the copy operation and the complex duplication of data structure. Moving the dataset to the GPU is then a simple memory migration from kernel point of view. This is share address space without special allocator is mandatory in new or future standard such as OpenCL, Cuda, C++, OpenMP, ... some other OS already have this and the industry want it. So the questions is do we want to support any of this, do we care about GPGPU ? I believe we want to support all this new standard but maybe i am the only one. In HMM case i have the extra painfull fact that the device memory is not accessible by the CPU. For CDM on contrary, CPU can access in a cache coherent way the device memory and all operation behave as regular memory (thing like atomic operation for instance). I hope this clearly explain why we can no longer rely on dedicated/ specialized memory allocator. Cheers, Jérôme -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>