Re: low-memory crash with patch "capture a page under direct compaction"

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 2019-03-05 at 14:42 +0000, Mel Gorman wrote:
> On Mon, Mar 04, 2019 at 10:55:04PM -0500, Qian Cai wrote:
> > Reverted the patches below from linux-next seems fixed a crash while running
> > LTP
> > oom01.
> > 
> > 915c005358c1 mm, compaction: Capture a page under direct compaction -fix
> > e492a5711b67 mm, compaction: capture a page under direct compaction
> > 
> > Especially, just removed this chunk along seems fixed the problem.
> > 
> > --- a/mm/compaction.c
> > +++ b/mm/compaction.c
> > @@ -2227,10 +2227,10 @@ compact_zone(struct compact_control *cc, struct
> > capture_control *capc)
> >                 }
> > 
> >                 /* Stop if a page has been captured */
> > -               if (capc && capc->page) {
> > -                       ret = COMPACT_SUCCESS;
> > -                       break;
> > -               }
> > 
> 
> It's hard to make sense of how this is connected to the bug. The
> out-of-bounds warning would have required page flags to be corrupted
> quite badly or maybe the use of an uninitialised page. How reproducible
> has this been for you? I just ran the test 100 times with UBSAN and page
> alloc debugging enabled and it completed correctly.
> 

I did manage to reproduce this every time by running oom01 within 3 tries on
this x86_64 server and was unable to reproduce on arm64 and ppc64le servers so
far.

# for i in `seq 1 3`; do /opt/ltp/testcases/bin/oom01 ; done

Sometimes, it could trigger different traces.

[  391.704320] SLUB: Unable to allocate memory on node -1,
gfp=0x800(GFP_NOWAIT)
[  391.737794]   cache: kmalloc-64, object size: 64, buffer size: 416,
default order: 2, min order: 0
[  391.778079]   node 0: slabs: 5999, objs: 232851, free: 16
[  391.802926]   node 1: slabs: 4303, objs: 167067, free: 37
[  499.866479] ------------[ cut here ]------------
[  499.866500] BUG: Bad page state in process oom01  pfn:fffffe7a09fffd07
[  499.890013] kernel BUG at mm/page_alloc.c:3124!
[  499.935430] double fault: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
[  499.971334] CPU: 0 PID: 1623 Comm: oom01 Tainted: G        W
5.0.0-next-20190305+ #49
[  499.992805]
================================================================================
[  500.009887] Hardware name: HP ProLiant DL180 Gen9/ProLiant DL180 Gen9,
BIOS U20 10/25/2017
[  500.009901] RIP: 0010:check_memory_region+0x10/0x1e0
[  500.048252] UBSAN: Undefined behaviour in
kernel/locking/qspinlock.c:138:9
[  500.085378] Code: 00 00 00 48 89 e5 e8 ff 3e 9f 00 5d c3 0f 1f 00 66 2e
0f 1f 84 00 00 00 00 00 48 85 f6 0f 84 68 01 00 00 55 0f b6 d2 48 89 e5
<41> 55 41 54 53 e9 b3 00 00 00 48 b8 00 00 00 00 00 00 00 ff 48 39
[  500.107608] index 8190 is out of range for type 'long unsigned int
[256]'
[  500.138462] RSP: 0000:ffff888428f80000 EFLAGS: 00010002
[  500.223186] CPU: 42 PID: 0 Comm: swapper/42 Tainted: G        W
5.0.0-next-20190305+ #49
[  500.253922] RAX: ffff88827fff41c0 RBX: ffff88827fff41c8 RCX:
ffffffff9c0a9468
[  500.253925] RDX: 0000000000000000 RSI: 0000000000000004 RDI:
ffff88827fff41f8
[  500.277367] Hardware name: HP ProLiant DL180 Gen9/ProLiant DL180 Gen9,
BIOS U20 10/25/2017
[  500.277370] Call Trace:
[  500.318081] RBP: ffff888428f80000 R08: ffffed104fffe840 R09:
ffffed104fffe83f
[  500.318085] R10: ffffed104fffe83f R11: ffff88827fff41fb R12:
ffff88827fff41f8
[  500.349838]  <IRQ>
[  500.381765] R13: ffff88827fff41c8 R14: ffff88842a96f770 R15:
ffff88827fff41c8
[  500.381768] FS:  00007fdfd3559700(0000) GS:ffff8881f3c00000(0000)
knlGS:0000000000000000
[  500.424074]  dump_stack+0x62/0x9a
[  500.435452] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  500.435455] CR2: ffff888428f7fff8 CR3: 000000041abca003 CR4:
00000000001606b0
[  500.467546]  ubsan_epilogue+0xd/0x7f
[  500.500039] Call Trace:
[  500.500042] Modules linked in: nls_iso8859_1 nls_cp437 vfat fat
kvm_intel kvm irqbypass efivars ip_tables x_tables xfs sd_mod ahci igb
libahci i2c_algo_bit i2c_core libata dm_mirror dm_region_hash dm_log dm_mod
efivarfs
[  500.509058]  __ubsan_handle_out_of_bounds+0x14d/0x192
[  500.541152] ---[ end trace f9ff2b89b6b88c5f ]---
[  500.541155] invalid opcode: 0000 [#2] SMP DEBUG_PAGEALLOC KASAN PTI
[  500.541159] CPU: 10 PID: 262 Comm: kcompactd0 Tainted: G      D W
5.0.0-next-20190305+ #49
[  500.541161] Hardware name: HP ProLiant DL180 Gen9/ProLiant DL180 Gen9,
BIOS U20 10/25/2017
[  500.541167] RIP: 0010:__isolate_free_page+0x464/0x600
[  500.541170] Code: 31 c0 5b 41 5c 41 5d 41 5e 41 5f 5d c3 48 c7 c6 20 6f
0b 9d 48 89 df e8 4a 8b f8 ff 0f 0b 48 c7 c7 a0 32 69 9d e8 51 40 43 00
<0f> 0b 48 c7 c7 e0 31 69 9d e8 43 40 43 00 48 c7 c6 80 71 0b 9d 48
[  500.541172] RSP: 0000:ffff8881f1fdf848 EFLAGS: 00010002
[  500.541175] RAX: 00000000f0000080 RBX: ffffea00064fc000 RCX:
ffff88827fff41d0
[  500.541177] RDX: 1ffffd4000c9f806 RSI: 0000000000000008 RDI:
ffffffff9d9f1640
[  500.541179] RBP: ffff8881f1fdf898 R08: ffffea00064fc000 R09:
ffff8881f1fdfd30
[  500.541181] R10: 0000000000000002 R11: 1ffff1104fffe83b R12:
0000000000000008
[  500.541183] R13: dffffc0000000000 R14: ffff88827fff3000 R15:
0000000000000002
[  500.541185] FS:  0000000000000000(0000) GS:ffff8881f4100000(0000)
knlGS:0000000000000000
[  500.541188] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  500.541190] CR2: 00007fdce416a000 CR3: 000000026ea16002 CR4:
00000000001606a0
[  500.541191] Call Trace:
[  500.541199]  compaction_alloc+0x886/0x25f0
[  500.541221]  unmap_and_move+0x37/0x1e70
[  500.541228]  migrate_pages+0x2ca/0xb20
[  500.541238]  compact_zone+0x19cb/0x3620
[  500.541252]  kcompactd_do_work+0x2df/0x680




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux