Re: + mm-fix-panic-in-__alloc_pages.patch added to -mm tree

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon 15-11-21 22:52:27, Dennis Zhou wrote:
> On Mon, Nov 15, 2021 at 11:11:44PM +0000, Alexey Makhalov wrote:
> > 
> > 
> > > On Nov 15, 2021, at 4:58 AM, Michal Hocko <mhocko@xxxxxxxx> wrote:
> > > 
> > > On Mon 15-11-21 11:04:16, Alexey Makhalov wrote:
> > >> Hi Michal,
> > >> 
> > >>> 
> > >>> I have asked several times for details about the specific setup that has
> > >>> led to the reported crash. Without much success so far. Reproduction
> > >>> steps would be the first step. That would allow somebody to work on this
> > >>> at least if Alexey doesn't have time to dive into this deeper.
> > >>> 
> > >> 
> > >> I didn’t know that repro steps are still not clear.
> > >> 
> > >> To reproduce the panic you need to have a system, where you can hot add
> > >> the CPU that belongs to memoryless NUMA node which is not present and onlined
> > >> yet. In other words, by hot adding CPU, you will add both CPU and NUMA node
> > >> at the same time.
> > > 
> > > There seems to be something different in your setup because memory less
> > > nodes have reportedly worked on x86. I suspect something must be
> > > different in your setup. Maybe it is that you are adding a cpu that is
> > > outside of possible cpus intialized during boot time. Those should have
> > > their nodes initialized properly - at least per init_cpu_to_node. Your
> > > report doesn't really explain how the cpu is hotadded. Maybe you are
> > > trying to do something that has never been supported on x86.
> > Memoryless nodes are supported by x86. But hot add of such nodes not quite
> > done.
> > 
> 
> I need some clarification here. It sounds like memoryless nodes work on
> x86, but hotplug + memoryless nodes isn't a supported use case or you're
> introducing it as a new use case?
> 
> If this is a new use case, then I'm inclined to say this patch should
> NOT go in and a proper fix should be implemented on hotplug's side. I
> don't want to be in the business of having/seeing this conversation
> reoccur because we just papered over this issue in percpu.

The patch still seems to be in the mmotm tree. I have sent a different
fix candidate [1] which should be more robust and cover also other potential
places.

[1] http://lkml.kernel.org/r/20211214100732.26335-1-mhocko@xxxxxxxxxx
-- 
Michal Hocko
SUSE Labs



[Index of Archives]     [Kernel Archive]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]

  Powered by Linux