On 01/30/2017 11:04 PM, Dave Hansen wrote: > On 01/29/2017 07:35 PM, Anshuman Khandual wrote: >> * CDM node's zones are not part of any other node's FALLBACK zonelist >> * CDM node's FALLBACK list contains it's own memory zones followed by >> all system RAM zones in regular order as before >> * CDM node's zones are part of it's own NOFALLBACK zonelist > > This seems like a sane policy for the system that you're describing. > But, it's still a policy, and it's rather hard-coded into the kernel. Right. In the original RFC which I had posted in October, I had thought about this issue and created 'pglist_data->coherent_device' as a u64 element where each bit in the mask can indicate a specific policy request for the hot plugged coherent device. But it looked too complicated in for the moment in absence of other potential coherent memory HW which really requires anything other than isolation and explicit allocation method. > Let's say we had a CDM node with 100x more RAM than the rest of the > system and it was just as fast as the rest of the RAM. Would we still > want it isolated like this? Or would we want a different policy? Though in this particular case this CDM can be hot plugged into the system as a normal NUMA node (I dont see any reason why it should not be treated as normal NUMA node) but I do understand the need for different policy requirements for different kind of coherent memory. But then the other argument being, dont we want to keep this 100X more memory isolated for some special purpose to be utilized by specific applications ? There is a sense that if the non system RAM memory is coherent and similar there cannot be much differences between what they would expect from the kernel. > > Why do we need this hard-coded along with the cpuset stuff later in the > series. Doesn't taking a node out of the cpuset also take it out of the > fallback lists? There are two mutually exclusive approaches which are described in this patch series. (1) zonelist modification based approach (2) cpuset restriction based approach As mentioned in the cover letter, " NOTE: These two set of patches mutually exclusive of each other and represent two different approaches. Only one of these sets should be applied at any point of time. Set1: mm: Change generic FALLBACK zonelist creation process mm: Change mbind(MPOL_BIND) implementation for CDM nodes Set2: cpuset: Add cpuset_inc() inside cpuset_init() mm: Exclude CDM nodes from task->mems_allowed and root cpuset mm: Ignore cpuset enforcement when allocation flag has __GFP_THISNODE " > >> while ((node = find_next_best_node(local_node, &used_mask)) >= 0) { >> +#ifdef CONFIG_COHERENT_DEVICE >> + /* >> + * CDM node's own zones should not be part of any other >> + * node's fallback zonelist but only it's own fallback >> + * zonelist. >> + */ >> + if (is_cdm_node(node) && (pgdat->node_id != node)) >> + continue; >> +#endif > > On a superficial note: Isn't that #ifdef unnecessary? is_cdm_node() has > a 'return 0' stub when the config option is off anyway. Right, will fix it up. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>