On Tue, Sep 24, 2019 at 12:56:22PM +0200, Michal Hocko wrote: > On Tue 24-09-19 11:17:14, Peter Zijlstra wrote: > > On Tue, Sep 24, 2019 at 09:47:51AM +0200, Michal Hocko wrote: > > > On Mon 23-09-19 22:34:10, Peter Zijlstra wrote: > > > > On Mon, Sep 23, 2019 at 06:52:35PM +0200, Michal Hocko wrote: > > > [...] > > > > > I even the > > > > > ACPI standard is considering this optional. Yunsheng Lin has referred to > > > > > the specific part of the standard in one of the earlier discussions. > > > > > Trying to guess the node affinity is worse than providing all CPUs IMHO. > > > > > > > > I'm saying the ACPI standard is wrong. > > > > > > Even if you were right on this the reality is that a HW is likely to > > > follow that standard and we cannot rule out NUMA_NO_NODE being > > > specified. As of now we would access beyond the defined array and that > > > is clearly a bug. > > > > Right, because the device node is wrong, so we fix _that_! > > > > > Let's assume that this is really a bug for a moment. What are you going > > > to do about that? BUG_ON? I do not really see any solution besides to either > > > provide something sensible or BUG_ON. If you are worried about a > > > conditional then this should be pretty easy to solve by starting the > > > array at -1 index and associate it with the online cpu mask. > > > > The same thing I proposed earlier; force the device node to 0 (or any > > other convenient random valid value) and issue a FW_BUG message to the > > console. > > Why would you "fix" anything and how do you know that node 0 is the > right choice? I have seen setups with node 0 without any memory and > similar unexpected things. We don't know 0 is right; but we know 'unkown' is wrong, so we FW_BUG and pick _something_. > To be honest I really fail to see why to object to a simple semantic > that NUMA_NO_NODE imply all usable cpus. Could you explain that please? Because it feels wrong. The device needs to be _somewhere_. It simply cannot be node-less.