On 16.04.20 18:13, Verma, Vishal L wrote: > On Thu, 2020-04-16 at 08:19 +0200, Michal Hocko wrote: >> On Wed 15-04-20 20:32:00, Verma, Vishal L wrote: >>>> >>>> I really do not like this. Why should we try to be clever and change the >>>> node id requested by the caller? I would just stick with node_possible >>>> check and be done with this. >>> >>> Hi Michal, >>> >>> Being clever allows us to still use the memory even if it is in a non- >>> optimal configuration. Failing here leaves the user no path to add this >>> memory until the firmware is fixed. It is the tradeoff between some >>> usability vs. how loud we want to be for the failure. >> >> Doing that papers over something that is clearly a FW issue and makes >> it "my performance is suboptimal" deal with it OS problem. Really, is >> this something we have to care about. Your changelog talks about a Qemu >> misconfiguration which is trivial to fix. Has this ever been observed >> with a real HW? >> > Well - more of a qemu bug I think - I can share the details, but it just > looked like it was producing a bogus SRAT. I think it is plausible that > such a firmware bug can happen out in the wild. The NFIT tables would > just need to reference a 'proximity domain' that the SRAT hasn't > previously described, and hotplug will happily go add memory from the > NFIT and the backing node related data structures would be missing. > > I'm not too opposed to erroring out, so long as we are ok with the fact > that we will leave some memory stranded until there's a firmware fix. So let's reject it and print a warning, so we know it's a thing. If this actually shows up often in real live, we have good evidence that we should tolerate buggy firmwares instead of warning/rejecting. (rejecting from inside add_memory() still makes sense IMHO) -- Thanks, David / dhildenb