On Fri, 2017-05-05 at 16:52 +0200, Michal Hocko wrote: > On Thu 04-05-17 17:49:21, Benjamin Herrenschmidt wrote: > > On Thu, 2017-05-04 at 14:52 +0200, Michal Hocko wrote: > > > But the direct reclaim would be effective only _after_ all other nodes > > > are full. > > > > > > I thought that kswapd reclaim is a problem because the HW doesn't > > > support aging properly but as the direct reclaim works then what is the > > > actual problem? > > > > Ageing isn't isn't completely broken. The ATS MMU supports > > dirty/accessed just fine. > > > > However the TLB invalidations are quite expensive with a GPU so too > > much harvesting is detrimental, and the GPU tends to check pages out > > using a special "read with intend to write" mode, which means it almost > > always set the dirty bit if the page is writable to begin with. > > This sounds pretty much like a HW specific details which is not the > right criterion to design general CDM around. I think Ben answered several of these questions. NUMA we felt was the best representation of such memory, but it has limitations in that we'd like to isolate some default algorithms that run on all nodes marked N_MEMORY. Do you see that as a concern? Would you like to see a generic policy like Ben said to handle node attributes like reclaim, autonuma, etc? > > So let me repeat the fundamental question. Is the only difference from > cpuless nodes the fact that the node should be invisible to processes > unless they specify an explicit node mask? If yes then we are talking > about policy in the kernel and that sounds like a big no-no to me. > Moreover cpusets already support exclusive numa nodes AFAIR. Why do you see this as a policy, it's a mechanism of isolating nodes, the nodes themselves are then used using mempolicy. > > I am either missing something important here, and the discussion so far > hasn't helped to be honest, or this whole CDM effort tries to build a > generic interface around a _specific_ piece of HW. The matter is worse > by the fact that the described usecases are so vague that it is hard to > build a good picture whether this is generic enough that a new/different > HW will still fit into this picture. The use case is similar to HMM, except that we've got coherent memory. We treat is as important and want to isolate normal allocations, unless the allocation is explicitly specified. CPUsets provide an isolation mechanism, but we see autonuma for example moving pages away when there is an access from the system side. With reclaim, it would be better to use the fallback list first then swap. Again the use case is: I'm trying to do a FAQ version here Isolate memory - why? - CDM memory is not meant for normal usage, applications can request for it explictly. Oflload their compute to the device where the memory is (the offload is via a user space API like CUDA/openCL/...) How do we isolate - NUMA or HMM? - Since the memory is coherent, NUMA provides the mechanism to isolate to a large extent via mempolicy. With NUMA we also get autonuma/kswapd/etc running. Something we would like to avoid. NUMA gives the application a transparent view of memory, in the sense that all mm features work, like direct page cache allocation in coherent device memory, limiting memory via cgroups if required, etc. With CPUSets, its possible for us to isolate allocation. One challenge is that the admin on the system may use them differently and applications need to be aware of running in the right cpuset to allocate memory from the CDM node. Putting all applications in the cpuset with the CDM node is not the right thing to do, which means the application needs to move itself to the right cpuset before requesting for CDM memory. It's not impossible to use CPUsets, just hard to configure correctly. - With HMM, we would need a HMM variant HMM-CDM, so that we are not marking the pages as unavailable, page cache cannot do directly to coherent memory. Audit of mm paths is required. Most of the other things should work. User access to HMM-CDM memory behind ZONE_DEVICE is via a device driver. Why do we need migration? - Depending on where the memory is being accessed from, we would like to migrate pages between system and coherent device memory. HMM provides DMA offload capability that is useful in both cases. What is the larger picture - end to end? - Applications can allocate memory on the device or in system memory, offload the compute via user space API. Migration can be used for performance if required since it helps to keep the memory local to the compute. Ben/Jerome/John/others did I get the FAQ right? >From my side, I want to ensure that the decision HMM-CDM or NUMA-CDM is based on our design and understanding, as opposed to the reason that the use case is not clear or in sufficient. I'd be happy if we said, we understand the use case and believe that HMM-CDM is better from the mm's perspective as its better because... as opposed to isolating NUMA attributes because .... or vice-versa. Thanks for the review, Balbir Singh. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>