On Thu, Nov 16, 2017 at 02:41:39PM -0800, chetan L wrote: > On Thu, Nov 16, 2017 at 1:29 PM, Jerome Glisse <jglisse@xxxxxxxxxx> wrote: > > > > > For the NUMA discussion this is related to CPU less node ie not wanting > > to add any more CPU less node (node with only memory) and they are other > > aspect too. For instance you do not necessarily have good informations > > from the device to know if a page is access a lot by the device (this > > kind of information is often only accessible by the device driver). Thus > > @Jerome - one comment w.r.t 'do not necessarily have good info on > device access'. > > So you could be assuming a few things here :). CCIX extends the CPU > complex's coherency domain(it is now a single/unified coherency > domain). The CCIX-EP (lets say an accelerator/XPU or a NIC or a combo) > is now a true peer w.r.t the host-numa-node(s) (aka 1st class > citizen). I don't know how much info was revealed at the latest ARM > techcon where CCIX was presented. So I cannot divulge any further > details until I see that slide deck. However, you can safely assume > that the host will have *all* the info w.r.t the device-access and > vice-versa. I do have access to CCIX, last time i read the draft, few month ago, my understanding was that there is no mechanism to differentiate between device behind the root complex. So when you do autonuma you don't know which of your CCIX device is the one faulting hence you can not keep track of that inside struct page for autonuma (ignoring the issue with the lack of CPUID for each device). This is what i mean by NUMA is not a good fit as it is. Yes everything is cache coherent and all, but that is just a small part of what is needed to make autonuma as it is today work. Jérôme