On 29/05/18 22:52, Jeremy Linton wrote: > Hi, > > On 05/29/2018 10:51 AM, Geert Uytterhoeven wrote: >> Hi Will, >> >> On Tue, May 29, 2018 at 5:08 PM, Will Deacon <will.deacon@xxxxxxx> wrote: >>> On Tue, May 29, 2018 at 02:18:40PM +0100, Sudeep Holla wrote: >>>> On 29/05/18 12:56, Geert Uytterhoeven wrote: >>>>> On Tue, May 29, 2018 at 1:14 PM, Sudeep Holla >>>>> <sudeep.holla@xxxxxxx> wrote: >>>>>> On 29/05/18 11:48, Geert Uytterhoeven wrote: >>>>>>> System supend still works fine on systems with big cores only: >>>>>>> >>>>>>> R-Car H3 ES1.0 (4xCA57 (4xCA53 disabled in firmware)) >>>>>>> R-Car M3-N (2xCA57) >>>>>>> >>>>>>> Reverting this commit fixes the issue for me. >>>>>> >>>>>> I can't find anything that relates to system suspend in these patches >>>>>> unless they are messing with something during CPU hot plug-in back >>>>>> during resume. >>>>> >>>>> It's only the last patch that introduces the breakage. >>>>> >>>> >>>> As specified in the commit log, it won't change any behavior for DT >>>> systems if it's non-NUMA or single node system. So I am still wondering >>>> what could trigger this regression. >>> >>> I wonder if we're somehow giving an uninitialised/invalid NUMA >>> configuration >>> to the scheduler, although I can't see how this would happen. >>> >>> Geert -- if you enable CONFIG_DEBUG_PER_CPU_MAPS=y and apply the diff >>> below >>> do you see anything shouting in dmesg? >> >> Thanks, but unfortunately it doesn't help. >> I added some debug code to print cpumask, but so far I don't see anything >> suspicious. > > I suspect most of the problem is related to the node mask changing at > unexpected times (particularly cores being removed from the mask). Once > I understand that more, there may be a simpler patch. > > OTOH, I've been testing with this, and with it, I can't seem to > duplicate the problem with CONFIG_NUMA disabled I found. > I am also giving it a run on my Juno(defconfig - CONFIG_NUMA) and CPU hotplug tests are fine with this change. -- Regards, Sudeep