On Mon, Jan 16, 2017 at 12:13 PM, Jerome Glisse <jglisse@xxxxxxxxxx> wrote: > On Mon, Jan 16, 2017 at 11:31:39AM -0800, Dan Williams wrote: [..] >> >> dev_pagemap is only meant for get_user_pages() to do lookups of ptes >> >> with _PAGE_DEVMAP and take a reference against the hosting device.. >> > >> > And i want to build on top of that to extend _PAGE_DEVMAP to support >> > a new usecase for unaddressable device memory. >> > >> >> >> >> Why can't HMM use the typical vm_operations_struct fault path and push >> >> more of these details to a driver rather than the core? >> > >> > Because the vm_operations_struct has nothing to do with the device. >> > We are talking about regular vma here. Think malloc, mmap, share >> > memory, ... not about mmap(/dev/thedevice,...) >> > >> > So the vm_operations_struct is never under device control and we can >> > not, nor want to, rely on that. >> >> Can you explain more what's behind that "can not, nor want to" >> statement? It seems to me that any awkwardness of moving to a >> standalone device file interface is less than a maintaining a new / >> parallel mm fault path through dev_pagemap. > > The whole point of HMM is to allow transparent usage of process address > space on to a device like GPU. So it imply any vma (vm_area_struct) that > result from usual mmap (ie any mmap either PRIVATE or SHARE as long as it > is not a an mmap of a device file). > > It means that application can use malloc or the usual memory allocation > primitive of the langage (c++, rust, python, ...) and directly use the > memory it gets from that with the device. So you need 100% support of all these mm paths for this hardware to be useful at all? Does a separate device-driver and a userpace helper library get you something like 80% of the functionality and then we can debate the core mm changes to get the final 20%? Or am I just completely off base with how people want to use this hardware? > Device like GPU have a large pool of device memory that is not accessible > by the CPU. This device memory has 10 times more bandwidth than system > memory and has better latency then PCIE. Hence for the whole thing to > make sense you need to allow to use it. > > For that you need to allow migration from system memory to device memory. > Because you can not rely on special userspace allocator you have to > assume that the vma (vm_area_struct) is a regular one. So we are left > with having struct page for the device memory to allow migration to > work without requiring too much changes to existing mm. > > Because device memory is not accessible by the CPU, you can not allow > anyone to pin it and thus get_user_page* must trigger a migration back > as CPU page fault would. > > >> > So what we looking for here is struct page that can behave mostly >> > like anyother except that we do not want to allow GUP to take a >> > reference almost exactly what ZONE_DEVICE already provide. >> > >> > So do you have any fundamental objections to this patchset ? And if >> > so, how do you propose i solve the problem i am trying to address ? >> > Because hardware exist today and without something like HMM we will >> > not be able to support such hardware. >> >> My pushback stems from it being a completely different use case for >> devm_memremap_pages(), as evidenced by it growing from 4 arguments to >> 9, and the ongoing maintenance overhead of understanding HMM >> requirements when updating the pmem usage of ZONE_DEVICE. > > I rather reuse something existing and modify it to support more use case > than try to add ZONE_DEVICE2 or ZONE_DEVICE_I_AM_DIFFERENT. I have made > sure that my modifications to ZONE_DEVICE can be use without HMM. It is > just a generic interface to support page fault and to allow to track last > user of a device page. Both can be use indepentently from each other. > > To me the whole point of kernel is trying to share infrastructure accross > as many hardware as possible and i am doing just that. I do not think HMM > should be block because something that use to be for one specific use case > now support 2 use cases. I am not breaking anything existing. Is it more > work for you ? Maybe, but at Red Hat we intend to support it for as long > as it is needed so you always have some one to talk to if you want to > update ZONE_DEVICE. Sharing infrastructure should not come at the expense of type safety and clear usage rules. For example the pmem case, before exposing ZONE_DEVICE memory to other parts of the kernel, introduced the pfn_t type to distinguish DMA capable pfns from other raw pfns. All programmatic ways of discovering if a pmem range can support DMA use this type and explicit flags. While we may not need ZONE_DEVICE2 we obviously need a different wrapper around arch_add_memory() than devm_memremap_pages() for HMM and likely a different physical address radix than pgmap_radix because they are servicing 2 distinct purposes. For example, I don't think HMM should be using unmodified arch_add_memory(). We shouldn't add unaddressable memory to the linear address mappings when we know there is nothing behind it, especially when it seems all you need from arch_add_memory() is pfn_to_page() to be valid. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>