[Ccing Mel and Andrea] On Fri 28-12-18 21:31:11, Wu Fengguang wrote: > > > > I haven't looked at the implementation yet but if you are proposing a > > > > special cased zone lists then this is something CDM (Coherent Device > > > > Memory) was trying to do two years ago and there was quite some > > > > skepticism in the approach. > > > > > > It looks we are pretty different than CDM. :) > > > We creating new NUMA nodes rather than CDM's new ZONE. > > > The zonelists modification is just to make PMEM nodes more separated. > > > > Yes, this is exactly what CDM was after. Have a zone which is not > > reachable without explicit request AFAIR. So no, I do not think you are > > too different, you just use a different terminology ;) > > Got it. OK.. The fall back zonelists patch does need more thoughts. > > In long term POV, Linux should be prepared for multi-level memory. > Then there will arise the need to "allocate from this level memory". > So it looks good to have separated zonelists for each level of memory. Well, I do not have a good answer for you here. We do not have good experiences with those systems, I am afraid. NUMA is with us for more than a decade yet our APIs are coarse to say the least and broken at so many times as well. Starting a new API just based on PMEM sounds like a ticket to another disaster to me. I would like to see solid arguments why the current model of numa nodes with fallback in distances order cannot be used for those new technologies in the beginning and develop something better based on our experiences that we gain on the way. I would be especially interested about a possibility of the memory migration idea during a memory pressure and relying on numa balancing to resort the locality on demand rather than hiding certain NUMA nodes or zones from the allocator and expose them only to the userspace. > On the other hand, there will also be page allocations that don't care > about the exact memory level. So it looks reasonable to expect > different kind of fallback zonelists that can be selected by NUMA policy. > > Thanks, > Fengguang -- Michal Hocko SUSE Labs