Re: [Bug 206401] kernel panic on Hyper-V after 5 minutes duetomemory hot-add

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 02/14/20 at 10:48pm, Baoquan He wrote:
> On 02/14/20 at 11:26pm, kkabe@xxxxxxxxxxx wrote:
> > bhe@xxxxxxxxxx sed in <20200213081941.GA19207@MiWiFi-R3L-srv>
> > 
> > >> On 02/13/20 at 01:22pm, kabe@xxxxxxxxxxx wrote:
> > >> > bhe@xxxxxxxxxx sed in <20200212073123.GG8965@MiWiFi-R3L-srv>
> > >> > 
> > >> > >> On 02/11/20 at 04:41pm, Andrew Morton wrote:
> > >> > >> > On Tue, 11 Feb 2020 07:07:41 +0800 Wei Yang <richardw.yang@xxxxxxxxxxxxxxx> wrote:
> > >> > >> > 
> > >> > >> > > On Mon, Feb 10, 2020 at 02:15:51PM +0800, Baoquan He wrote:
> > >> > >> > > >On 02/10/20 at 02:09pm, Baoquan He wrote:
> > >> > >> > > >> On 02/09/20 at 09:56pm, Andrew Morton wrote:
> > >> > >> > > >> > On Mon, 10 Feb 2020 13:40:27 +0800 Baoquan He <bhe@xxxxxxxxxx> wrote:
> > >> > >> > > >> > 
> > >> > >> > > >> > > Hi Andrew,
> > >> > >> > > >> > > 
> > >> > >> > > >> > > On 02/09/20 at 09:32pm, Andrew Morton wrote:
> > >> > >> > > >> > > > On Tue, 04 Feb 2020 11:25:48 +0000 bugzilla-daemon@xxxxxxxxxxxxxxxxxxx wrote:
> > >> > >> > > >> > > > 
> > >> > >> > > >> > > > > https://bugzilla.kernel.org/show_bug.cgi?id=206401
> > >> > >> > > >> > > > > 
> > >> > >> > > >> > > > 
> > >> > >> > > >> > > > An oops during mem hotadd.  Could someone please take a look when
> > >> > >> > > >> > > > convenient?
> > >> > >> > > >> > > 
> > >> > >> > > >> > > This has been addressed by Wei Yang's patch, please check it here:
> > >> > >> > > >> > > 
> > >> > >> > > >> > > http://lkml.kernel.org/r/20200209104826.3385-7-bhe@xxxxxxxxxx
> > >> > >> > > >> > > 
> > >> > >> > > >> > 
> > >> > >> > > >> > hm, OK, thanks.  It's unfortunate that a 5.5 fix is buried in a
> > >> > >> > > >> > six-patch series which is still in progress!  Can we please merge that
> > >> > >> > > >> > as a standalone fix with a cc:stable, Fixes:, etc?
> > >> > >> > > >
> > >> > >> > > >Maybe can add Fixes tag as follow when merge:
> > >> > >> > > >
> > >> > >> > > >Fixes: ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug")
> > >> > >> > > >
> > >> > >> > 
> > >> > >> > The reporter (cc'ed here) is still seeing issues:
> > >> > >> > https://bugzilla.kernel.org/show_bug.cgi?id=206401
> > >> > >> > 
> > >> > >> > Could we please continue this investigation via emailed reply-to-all,
> > >> > >> > rather than via the bugzilla interface?
> > >> > >> 
> > >> > >> Yes, people prefer mailing list to discuss issues.
> > >> > >> 
> > >> > >> Hi T.Kabe, 
> > >> > >> 
> > >> > >> Could you provide the call trace again after below patch is applied?
> > >> > >> The comment #9 in bugzilla is not very clear to me.
> > >> > >> 
> > >> > >> mm/sparsemem: pfn_to_page is not valid yet on SPARSEMEM
> > >> > >> http://lkml.kernel.org/r/20200209104826.3385-7-bhe@xxxxxxxxxx
> > >> > >> 
> > >> > >> And, as you said, applying above patch, and do not call
> > >> > >> __free_pages_core() in generic_online_page() will work. I doubt it,
> > >> > >> because without __free_pages_core(), your added pages are not added
> > >> > >> into buddy for managing. I think we should make clear this problem
> > >> > >> firstly, in order not to introduce new problem by improper work around,
> > >> > >> then check next.
> > >> > >> 
> > >> > >> Thanks
> > >> > >> Baoquan
> > >> > 
> > >> > Got it, I restarted off fresh from kernel-5.6-rc1,
> > >> > applied patch
> > >> > >> http://lkml.kernel.org/r/20200209104826.3385-7-bhe@xxxxxxxxxx
> > >> > and got the following panic.
> > >> > 
> > >> > Diag printk's for add_memory() et al is not there, but I guess
> > >> > memory hot-add request from hypervisor is returning "success", 
> > >> > corrupting something else and bombing out later.
> > >> > 
> > >> > 
> > >> > [   24.289967] Not activating Mandatory Access Control as /sbin/tomoyo-init does not exist.
> > >> > [  302.263730] hv_balloon: Max. dynamic memory size: 1048576 MB
> > >> > [  635.216014] BUG: unable to handle page fault for address: d13ff000
> > >> > [  635.216058] #PF: supervisor write access in kernel mode
> > >> > [  635.216076] #PF: error_code(0x0002) - not-present page
> > >> > [  635.216106] *pde = 00000000
> > >> 
> > >> Thanks for the info. What ARCH is your system?  Could you attach your
> > >> kernel config and paste the output of executing 'readelf /proc/kcore'?
> > 
> > Arch is i386(i586), non-PAE.
> > 
> > I'll attach the "readelf -a /proc/kcore", dmesg and .config .
> > The stack trace is different this time also;
> > it seems to have slightly difference panic trace every time 
> > after handle_mm_fault().
> 
> Sorry, I didn't say it clearly. 'readelf -l /proc/kcore' is OK, and the
> relevant call trace.

No need to provide them, can find them from the 'readelf -a'. Will check
and see if I can find anything. Thanks for the info.

> 
> > 
> > I've temporary added pr_info() before and after add_memory() in hv_baloon.ko,
> > so it says it's taining the kernel.
> > add_memory() itself is returning 0 (success).
> > 
> > 
> 
> 






[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux