Re: [Bug 206401] kernel panic on Hyper-V after 5 minutes due tomemory hot-add

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 02/13/20 at 01:22pm, kabe@xxxxxxxxxxx wrote:
> bhe@xxxxxxxxxx sed in <20200212073123.GG8965@MiWiFi-R3L-srv>
> 
> >> On 02/11/20 at 04:41pm, Andrew Morton wrote:
> >> > On Tue, 11 Feb 2020 07:07:41 +0800 Wei Yang <richardw.yang@xxxxxxxxxxxxxxx> wrote:
> >> > 
> >> > > On Mon, Feb 10, 2020 at 02:15:51PM +0800, Baoquan He wrote:
> >> > > >On 02/10/20 at 02:09pm, Baoquan He wrote:
> >> > > >> On 02/09/20 at 09:56pm, Andrew Morton wrote:
> >> > > >> > On Mon, 10 Feb 2020 13:40:27 +0800 Baoquan He <bhe@xxxxxxxxxx> wrote:
> >> > > >> > 
> >> > > >> > > Hi Andrew,
> >> > > >> > > 
> >> > > >> > > On 02/09/20 at 09:32pm, Andrew Morton wrote:
> >> > > >> > > > On Tue, 04 Feb 2020 11:25:48 +0000 bugzilla-daemon@xxxxxxxxxxxxxxxxxxx wrote:
> >> > > >> > > > 
> >> > > >> > > > > https://bugzilla.kernel.org/show_bug.cgi?id=206401
> >> > > >> > > > > 
> >> > > >> > > > 
> >> > > >> > > > An oops during mem hotadd.  Could someone please take a look when
> >> > > >> > > > convenient?
> >> > > >> > > 
> >> > > >> > > This has been addressed by Wei Yang's patch, please check it here:
> >> > > >> > > 
> >> > > >> > > http://lkml.kernel.org/r/20200209104826.3385-7-bhe@xxxxxxxxxx
> >> > > >> > > 
> >> > > >> > 
> >> > > >> > hm, OK, thanks.  It's unfortunate that a 5.5 fix is buried in a
> >> > > >> > six-patch series which is still in progress!  Can we please merge that
> >> > > >> > as a standalone fix with a cc:stable, Fixes:, etc?
> >> > > >
> >> > > >Maybe can add Fixes tag as follow when merge:
> >> > > >
> >> > > >Fixes: ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug")
> >> > > >
> >> > 
> >> > The reporter (cc'ed here) is still seeing issues:
> >> > https://bugzilla.kernel.org/show_bug.cgi?id=206401
> >> > 
> >> > Could we please continue this investigation via emailed reply-to-all,
> >> > rather than via the bugzilla interface?
> >> 
> >> Yes, people prefer mailing list to discuss issues.
> >> 
> >> Hi T.Kabe, 
> >> 
> >> Could you provide the call trace again after below patch is applied?
> >> The comment #9 in bugzilla is not very clear to me.
> >> 
> >> mm/sparsemem: pfn_to_page is not valid yet on SPARSEMEM
> >> http://lkml.kernel.org/r/20200209104826.3385-7-bhe@xxxxxxxxxx
> >> 
> >> And, as you said, applying above patch, and do not call
> >> __free_pages_core() in generic_online_page() will work. I doubt it,
> >> because without __free_pages_core(), your added pages are not added
> >> into buddy for managing. I think we should make clear this problem
> >> firstly, in order not to introduce new problem by improper work around,
> >> then check next.
> >> 
> >> Thanks
> >> Baoquan
> 
> Got it, I restarted off fresh from kernel-5.6-rc1,
> applied patch
> >> http://lkml.kernel.org/r/20200209104826.3385-7-bhe@xxxxxxxxxx
> and got the following panic.
> 
> Diag printk's for add_memory() et al is not there, but I guess
> memory hot-add request from hypervisor is returning "success", 
> corrupting something else and bombing out later.
> 
> 
> [   24.289967] Not activating Mandatory Access Control as /sbin/tomoyo-init does not exist.
> [  302.263730] hv_balloon: Max. dynamic memory size: 1048576 MB
> [  635.216014] BUG: unable to handle page fault for address: d13ff000
> [  635.216058] #PF: supervisor write access in kernel mode
> [  635.216076] #PF: error_code(0x0002) - not-present page
> [  635.216106] *pde = 00000000

Thanks for the info. What ARCH is your system?  Could you attach your
kernel config and paste the output of executing 'readelf /proc/kcore'?

The pmd entry is not filled, I want to check which address range the kernel
is acessing, and please attach the log of dmesg. Probably it's hot added
page area, I guess, since this time the preceding trace is different
with comment #9.

> [  635.216139] Oops: 0002 [#1] SMP
> [  635.216171] CPU: 0 PID: 470 Comm: systemd-udevd Not tainted 5.6.0-rc1.el8.i586 #1
> [  635.216199] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090006  05/23/2012
> [  635.216233] EIP: wp_page_copy+0x8e/0x750
> [  635.216253] Code: 03 00 00 8b 45 d0 85 c0 0f 84 46 05 00 00 e8 d9 85 e5 ff 89 45 bc 89 f8 e8 cf 85 e5 ff 8b 55 bc 8d 78 04 8b 0a 83 e7 fc 89 d6 <89> 08 8b 8a fc 0f 00 00 89 88 fc 0f 00 00 89 c1 29 f9 89 55 bc 29
> [  635.216293] EAX: d13ff000 EBX: c3743f28 ECX: 00000000 EDX: c10c9000
> [  635.216314] ESI: c10c9000 EDI: d13ff004 EBP: c3743eec ESP: c3743ea8
> [  635.216336] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00210282
> [  635.216368] CR0: 80050033 CR2: d13ff000 CR3: 03add000 CR4: 003406d0
> [  635.216389] Call Trace:
> [  635.216407]  ? reuse_swap_page+0x83/0x390
> [  635.216425]  do_wp_page+0x87/0x6e0
> [  635.216438]  ? __do_sys_fstat64+0x4a/0x60
> [  635.216453]  handle_mm_fault+0x808/0xe30
> [  635.216468]  do_page_fault+0x19f/0x4d0
> [  635.216484]  ? do_kern_addr_fault+0x80/0x80
> [  635.216500]  common_exception_read_cr2+0x15a/0x15f
> [  635.216521] EIP: 0xb7b28104
> [  635.216538] Code: 29 f9 89 4c 24 10 83 f9 0f 0f 86 92 00 00 00 8b 45 40 8d 14 3e 8b 4c 24 0c 39 48 0c 75 74 8b 4c 24 0c 81 7c 24 10 ef 03 00 00 <89> 42 08 89 4a 0c 89 55 40 89 50 0c 76 0e c7 42 10 00 00 00 00 c7
> [  635.216591] EAX: b7c4e7d8 EBX: 000011a0 ECX: b7c4e7d8 EDX: 01994178
> [  635.216606] ESI: 01993168 EDI: 00001010 EBP: b7c4e7a0 ESP: bfcc9f00
> [  635.216628] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b EFLAGS: 00210293
> [  635.216661] Modules linked in: rfkill intel_rapl_msr intel_rapl_common snd_pcm snd_timer snd soundcore crc32_pclmul intel_rapl_perf sg pcspkr hv_netvsc joydev i2c_piix4 hyperv_fb hv_utils hv_balloon ip_tables ext4 mbcache jbd2 sd_mod t10_pi sr_mod cdrom ata_generic hyperv_keyboard hid_hyperv hv_storvsc scsi_transport_fc ata_piix crc32c_intel serio_raw hv_vmbus libata
> [  635.216758] CR2: 00000000d13ff000
> [  635.216769] ---[ end trace dee4a93859538102 ]---
> [  635.216785] EIP: wp_page_copy+0x8e/0x750
> [  635.216811] Code: 03 00 00 8b 45 d0 85 c0 0f 84 46 05 00 00 e8 d9 85 e5 ff 89 45 bc 89 f8 e8 cf 85 e5 ff 8b 55 bc 8d 78 04 8b 0a 83 e7 fc 89 d6 <89> 08 8b 8a fc 0f 00 00 89 88 fc 0f 00 00 89 c1 29 f9 89 55 bc 29
> [  635.216847] EAX: d13ff000 EBX: c3743f28 ECX: 00000000 EDX: c10c9000
> [  635.216864] ESI: c10c9000 EDI: d13ff004 EBP: c3743eec ESP: c3743ea8
> [  635.216883] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00210282
> [  635.216899] CR0: 80050033 CR2: d13ff000 CR3: 03add000 CR4: 003406d0
> [  635.216914] Kernel panic - not syncing: Fatal exception
> [  635.216926] Kernel Offset: 0x1400000 from 0xc1000000 (relocation range: 0xc0000000-0xcafeffff)
> [  635.216946] ---[ end Kernel panic - not syncing: Fatal exception ]---
> 
> -- 
> kabe
> 






[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux