On Mon 15-11-21 22:52:27, Dennis Zhou wrote: > On Mon, Nov 15, 2021 at 11:11:44PM +0000, Alexey Makhalov wrote: > > > > > > > On Nov 15, 2021, at 4:58 AM, Michal Hocko <mhocko@xxxxxxxx> wrote: > > > > > > On Mon 15-11-21 11:04:16, Alexey Makhalov wrote: > > >> Hi Michal, > > >> > > >>> > > >>> I have asked several times for details about the specific setup that has > > >>> led to the reported crash. Without much success so far. Reproduction > > >>> steps would be the first step. That would allow somebody to work on this > > >>> at least if Alexey doesn't have time to dive into this deeper. > > >>> > > >> > > >> I didn’t know that repro steps are still not clear. > > >> > > >> To reproduce the panic you need to have a system, where you can hot add > > >> the CPU that belongs to memoryless NUMA node which is not present and onlined > > >> yet. In other words, by hot adding CPU, you will add both CPU and NUMA node > > >> at the same time. > > > > > > There seems to be something different in your setup because memory less > > > nodes have reportedly worked on x86. I suspect something must be > > > different in your setup. Maybe it is that you are adding a cpu that is > > > outside of possible cpus intialized during boot time. Those should have > > > their nodes initialized properly - at least per init_cpu_to_node. Your > > > report doesn't really explain how the cpu is hotadded. Maybe you are > > > trying to do something that has never been supported on x86. > > Memoryless nodes are supported by x86. But hot add of such nodes not quite > > done. > > > > I need some clarification here. It sounds like memoryless nodes work on > x86, but hotplug + memoryless nodes isn't a supported use case or you're > introducing it as a new use case? > > If this is a new use case, then I'm inclined to say this patch should > NOT go in and a proper fix should be implemented on hotplug's side. I > don't want to be in the business of having/seeing this conversation > reoccur because we just papered over this issue in percpu. The patch still seems to be in the mmotm tree. I have sent a different fix candidate [1] which should be more robust and cover also other potential places. [1] http://lkml.kernel.org/r/20211214100732.26335-1-mhocko@xxxxxxxxxx -- Michal Hocko SUSE Labs