Re: [LKP] Re: 87c4696d57 ("mm/debug: Add tests validating architecture page .."): [ 1.395296] kernel BUG at include/linux/mm.h:2007!

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 01/07/2020 12:00 PM, Rong Chen wrote:
> 
> 
> On 1/7/20 1:57 PM, Anshuman Khandual wrote:
>> On 12/26/2019 02:19 PM, kernel test robot wrote:
>>> 46cf053efe  Linux 5.5-rc3
>>> 87c4696d57  mm/debug: Add tests validating architecture page table helpers
>>> +------------------------------------------+----------+------------+
>>> |                                          | v5.5-rc3 | 87c4696d57 |
>>> +------------------------------------------+----------+------------+
>>> | boot_successes                           | 32       | 0          |
>>> | boot_failures                            | 0        | 11         |
>>> | kernel_BUG_at_include/linux/mm.h         | 0        | 11         |
>>> | invalid_opcode:#[##]                     | 0        | 11         |
>>> | EIP:pgtable_pmd_page_dtor                | 0        | 11         |
>>> | Kernel_panic-not_syncing:Fatal_exception | 0        | 11         |
>>> +------------------------------------------+----------+------------+
>>>
>>> If you fix the issue, kindly add following tag
>>> Reported-by: kernel test robot <lkp@xxxxxxxxx>
>>>
>>> [    1.390624] smp: Brought up 1 node, 2 CPUs
>>> [    1.390624] smpboot: Max logical packages: 2
>>> [    1.390624] smpboot: Total of 2 processors activated (8783.48 BogoMIPS)
>>> [    1.391537] debug_vm_pgtable: debug_vm_pgtable: Validating architecture page table helpers
>>> [    1.392382] page:f29b85c0 refcount:0 mapcount:0 mapping:00000000 index:0x0
>>> [    1.393415] raw: 02800000 f29b8624 f29b8584 00000000 00000000 edc22280 ffffffff 00000000
>>> [    1.394178] page dumped because: VM_BUG_ON_PAGE(page->pmd_huge_pte)
>>> [    1.394820] ------------[ cut here ]------------
>>> [    1.395296] kernel BUG at include/linux/mm.h:2007!
>>> [    1.395942] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC PTI
>>> [    1.396463] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.5.0-rc3-00001-g87c4696d57b5e #1
>>> [    1.396722] EIP: pgtable_pmd_page_dtor+0x1a/0x23
>>> [    1.396722] Code: d4 8a 27 c2 e8 16 81 04 00 b2 01 5b 88 d0 5d c3 55 89 e5 52 89 45 fc 8b 45 fc 83 78 08 00 74 0c ba e1 e2 e0 c1 e8 14 99 13 00 <0f> 0b e8 92 eb 13 00 c9 c3 55 89 e5 52 89 45 fc 8b 45 fc 90 8d 74
>>> [    1.396722] EAX: c1e0e2e1 EBX: 2dc2e000 ECX: 00000000 EDX: c1e0e2e1
>>> [    1.396722] ESI: edc2b000 EDI: edc4e010 EBP: ee287f14 ESP: ee287f10
>>> [    1.396722] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00010246
>>> [    1.396722] CR0: 80050033 CR2: ffffffff CR3: 0226a000 CR4: 001406b0
>>> [    1.396722] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
>>> [    1.396722] DR6: fffe0ff0 DR7: 00000400
>>> [    1.396722] Call Trace:
>>> [    1.396722]  mop_up_one_pmd+0x48/0x62
>>> [    1.396722]  pgd_free+0x35/0xe0
>>> [    1.396722]  __mmdrop+0x42/0x96
>>> [    1.396722]  debug_vm_pgtable+0x460/0x47c
>>> [    1.396722]  kernel_init_freeable+0x84/0x172
>>> [    1.396722]  ? rest_init+0xe9/0xe9
>>> [    1.396722]  kernel_init+0xd/0xe9
>>> [    1.396722]  ret_from_fork+0x1e/0x28
>>> [    1.396722] Modules linked in:
>>> [    1.396742] ---[ end trace 9c6f11143a94c590 ]---
>>> [    1.397197] EIP: pgtable_pmd_page_dtor+0x1a/0x23
>> Hello,
>>
>> Wondering if some one could help me with steps to reproduce this crash ?
>> Could not reproduce the problem with the patch applied on Linux 5.5-rc3
>> when built with the config file provided here on a standard KVM guest.
>>
>> - Anshuman
> 
> Hi Anshuman,
> 
> You can compile the kernel with config-5.5.0-rc3-00001-g87c4696d57b5e, and run the reproduce script.
> Both files are in the original report mail.

I did compile the kernel (5.5-rc3 with this patch) along with given config
file config-5.5.0-rc3-00001-g87c4696d57b5e. Tried building kernel with and
without ("ARCH=i386 olddefconfig prepare modules_prepare bzImage") for two
different experiments.

> 
> # ./reproduce-yocto-vm-yocto-f91855057302-20191226051639-i386-randconfig-a001-20191225-5.5.0-rc3-00001-g87c4696d57b5e-1 ~/linux/arch/x86/boot/bzImage 2>&1 | tail -20
> [    1.471128] Call Trace:
> [    1.471128]  mop_up_one_pmd+0x48/0x62
> [    1.471128]  pgd_free+0x33/0xcc
> [    1.471128]  __mmdrop+0x42/0x96
> [    1.471128]  debug_vm_pgtable+0x45d/0x465
> [    1.471128]  kernel_init_freeable+0x83/0x16b
> [    1.471128]  ? rest_init+0xe0/0xe0
> [    1.471128]  kernel_init+0xd/0xe9
> [    1.471128]  ret_from_fork+0x1e/0x28
> [    1.471128] Modules linked in:
> [    1.471134] ---[ end trace b241750e0a95311e ]---
> [    1.471570] EIP: pgtable_pmd_page_dtor+0x1a/0x23
> [    1.472006] Code: ba 9b 0b df c1 e8 eb 71 04 00 5b 89 f0 5e 5d c3 55 89 e5 52 89 45 fc 8b 45 fc 83 78 08 00 74 0c ba b6 0b df c1 e8 d6 51 13 00 <0f> 0b e8 c6 a3 13 00 c9 c3 55 89 e5 52 89 45 fc 8b 45 fc 90 8d 74
> [    1.473746] EAX: c1df0bb6 EBX: 2e42d000 ECX: 00000000 EDX: c1df0bb6
> [    1.474340] ESI: ee42b000 EDI: ee44e008 EBP: eea87f20 ESP: eea87f1c
> [    1.474465] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00010246
> [    1.475112] CR0: 80050033 CR2: ffffffff CR3: 02242000 CR4: 001406b0
> [    1.475712] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
> [    1.476299] DR6: fffe0ff0 DR7: 00000400
> [    1.476661] Kernel panic - not syncing: Fatal exception

In both the cases, could not reproduce the problem after following the
above test procedure. Am I missing something here ?

[    0.983425] TSC deadline timer enabled
[    0.984054] smpboot: CPU0: Intel Core Processor (Haswell) (family: 0x6, model: 0x3c, stepping: 0x1)
[    0.984054] Performance Events: unsupported p6 CPU model 60 no PMU driver, software events only.
[    0.984122] rcu: Hierarchical SRCU implementation.
[    0.986937] smp: Bringing up secondary CPUs ...
[    0.988760] x86: Booting SMP configuration:
[    0.989499] .... node  #0, CPUs:      #1
[    0.403123] kvm-clock: cpu 1, msr 2c35041, secondary cpu clock
[    0.403123] masked ExtINT on CPU#1
[    0.403123] smpboot: CPU 1 Converting physical 0 to logical die 1
[    0.997431] KVM setup async PF for cpu 1
[    0.998057] kvm-stealtime: cpu 1, msr 23ed19f00
[    0.998763] smp: Brought up 1 node, 2 CPUs
[    0.998763] smpboot: Max logical packages: 2
[    0.998763] smpboot: Total of 2 processors activated (8782.17 BogoMIPS)
[    1.000952] debug_vm_pgtable: debug_vm_pgtable: Validating architecture page table helpers --> [Test Ran]
[    1.002305] devtmpfs: initialized
[    1.002305] version magic: 0x3530342a
[    1.005978] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 6370867519511994 ns
[    1.007404] futex hash table entries: 512 (order: 4, 65536 bytes, linear)
[    1.008515] pinctrl core: initialized pinctrl subsystem

The previously reported error log here

[    1.390624] smp: Brought up 1 node, 2 CPUs
[    1.390624] smpboot: Max logical packages: 2
[    1.390624] smpboot: Total of 2 processors activated (8783.48 BogoMIPS)
[    1.391537] debug_vm_pgtable: debug_vm_pgtable: Validating architecture page table helpers
[    1.392382] page:f29b85c0 refcount:0 mapcount:0 mapping:00000000 index:0x0
[    1.393415] raw: 02800000 f29b8624 f29b8584 00000000 00000000 edc22280 ffffffff 00000000
[    1.394178] page dumped because: VM_BUG_ON_PAGE(page->pmd_huge_pte)
[    1.394820] ------------[ cut here ]------------
[    1.395296] kernel BUG at include/linux/mm.h:2007!
[    1.395942] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC PTI
[    1.396463] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.5.0-rc3-00001-g87c4696d57b5e #1
[    1.396722] EIP: pgtable_pmd_page_dtor+0x1a/0x23
[    1.396722] Code: d4 8a 27 c2 e8 16 81 04 00 b2 01 5b 88 d0 5d c3 55 89 e5 52 89 45 fc 8b
45 fc 83 78 08 00 74 0c ba e1 e2 e0 c1 e8 14 99 13 00 <0f> 0b e8 92 eb 13 00 c9 c3 55 89 e5
52 89 45 fc 8b 45 fc 90 8d 74
[    1.396722] EAX: c1e0e2e1 EBX: 2dc2e000 ECX: 00000000 EDX: c1e0e2e1
[    1.396722] ESI: edc2b000 EDI: edc4e010 EBP: ee287f14 ESP: ee287f10
[    1.396722] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00010246
[    1.396722] CR0: 80050033 CR2: ffffffff CR3: 0226a000 CR4: 001406b0
[    1.396722] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[    1.396722] DR6: fffe0ff0 DR7: 00000400
[    1.396722] Call Trace:
[    1.396722]  mop_up_one_pmd+0x48/0x62
[    1.396722]  pgd_free+0x35/0xe0
[    1.396722]  __mmdrop+0x42/0x96
[    1.396722]  debug_vm_pgtable+0x460/0x47c
[    1.396722]  kernel_init_freeable+0x84/0x172
[    1.396722]  ? rest_init+0xe9/0xe9
[    1.396722]  kernel_init+0xd/0xe9
[    1.396722]  ret_from_fork+0x1e/0x28
[    1.396722] Modules linked in:
[    1.396742] ---[ end trace 9c6f11143a94c590 ]---
[    1.397197] EIP: pgtable_pmd_page_dtor+0x1a/0x23

might be getting generated from this path

kernel BUG at include/linux/mm.h:2007!

debug_vm_pgtable()
__mmdrop()
 pgd_free()
  pgd_mop_up_pmds()
   mop_up_one_pmd()
    pmd_free()
     pgtable_pmd_page_dtor()

static inline void pgtable_pmd_page_dtor(struct page *page)
{
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
        VM_BUG_ON_PAGE(page->pmd_huge_pte, page); ---------> BUG
#endif
        ptlock_free(page);
}

In here, a minimal page table is being created with helpers to
perform various tests before being freed up.

	...............................................
        mm = mm_alloc();
        if (!mm) {
                pr_err("mm_struct allocation failed\n");
                return;
        }
	...............................................
	pgdp = pgd_offset(mm, vaddr);
        p4dp = p4d_alloc(mm, pgdp, vaddr);
        pudp = pud_alloc(mm, p4dp, vaddr);
        pmdp = pmd_alloc(mm, pudp, vaddr);
        ptep = pte_alloc_map(mm, pmdp, vaddr);
	...............................................
	saved_p4dp = p4d_offset(pgdp, 0UL);
        saved_pudp = pud_offset(p4dp, 0UL);
        saved_pmdp = pmd_offset(pudp, 0UL);
        saved_ptep = pmd_pgtable(pmd);
	...............................................
        p4d_free(mm, saved_p4dp);
        pud_free(mm, saved_pudp);
        pmd_free(mm, saved_pmdp);
        pte_free(mm, saved_ptep);
        mm_dec_nr_puds(mm);
        mm_dec_nr_pmds(mm);
        mm_dec_nr_ptes(mm);
        __mmdrop(mm);
	..............................................

Is the above page table allocation-free sequence problematic for any
particular x86 configuration ? Though I have not seen these sequence
fail either on arm64 or x86. But the config option coverage during
my experiments were limited. Any suggestions or pointers welcome.

- Anshuman

> 
> Best Regards,
> Rong Chen
> 





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux