Re: Kernel crash triggered by dd to file with memcg, worst on btrfs

Vladimir Davydov <vdavydov@xxxxxxxxxxxxx> · Thu, 24 Apr 2014 16:19:40 +0400

On 04/24/2014 01:58 AM, Michal Hocko wrote:
>> I have a test case in which I can often crash an entire machine by running
>> dd to a file with a memcg with relatively generous limits. This is
>> simplified from real world problems with heavy disk i/o inside containers.
>>
>> The crashes are easy to trigger when dding to create a file on btrfs. On
>> ext3, typically there is just an error in the kernel log, although
>> occasionally it also crashes.

I have a suspicion that crashes occur, because btrfs doesn't always
expect that kmalloc may fail. At least, when trying to reproduce
the bug, I got the following stack trace, which clearly points to
tree_mod_log_free_eb, where we panic if kmalloc doesn't succeed:

[  728.489378] kernel BUG at fs/btrfs/ctree.c:985!
[  728.489508] invalid opcode: 0000 [#1] SMP
[  728.489636] Modules linked in: btrfs raid6_pq xor xt_length xt_hl xt_tcpmss xt_TCPMSS iptable_mangle xt_multiport xt_limit xt_dscp fuse ebt_among ebtable_filter ebtables bridge stp llc binfmt_misc sbs ppdev sbshc parport_pc parport microcode pcspkr lpc_ich mfd_core e1000 raid1 virtio_balloon shpchp [last unloaded: speedstep_lib]
[  728.490739] CPU: 0 PID: 4053 Comm: dd Not tainted 3.15.0-rc2-mm1+ #160
[  728.490913] Hardware name: Parallels Software International Inc. Parallels Virtual Platform/Parallels Virtual Platform, BIOS 5.0.19471.922416 10/26/2007
[  728.491325] task: ffff880069fe9630 ti: ffff880003dd2000 task.ti: ffff880003dd2000
[  728.491544] RIP: 0010:[<ffffffffa013d497>]  [<ffffffffa013d497>] tree_mod_log_set_root_pointer+0x27/0x40 [btrfs]
[  728.491832] RSP: 0018:ffff880003dd3898  EFLAGS: 00010282
[  728.491975] RAX: 00000000fffffff4 RBX: ffff880076df8000 RCX: 0000000000000007
[  728.492203] RDX: 0000000000003dc0 RSI: ffff880069fe9ed8 RDI: 0000000000000000
[  728.492445] RBP: ffff880003dd3898 R08: ffffffff81d28d40 R09: ffff88007660dd00
[  728.492634] R10: 0000000000000003 R11: 0000000000000001 R12: ffff88006db2c1d0
[  728.492822] R13: ffff8800053d46e0 R14: 0000000000000000 R15: ffff88006db18000
[  728.493010] FS:  00007fa6df2cb700(0000) GS:ffff880079600000(0000) knlGS:0000000000000000
[  728.493239] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  728.493441] CR2: 00007fa6debea300 CR3: 000000004c064000 CR4: 00000000000006f0
[  728.493630] Stack:
[  728.493691]  ffff880003dd3948 ffffffffa013dc28 ffff880000000001 0000000000000000
[  728.493918]  0000000000000000 ffffffff8101c809 ffff880003dd38d8 ffffffff810b3a75
[  728.494146]  ffff8800053d46e0 00000000810cfed5 ffff880003dd3a08 0000000000000000
[  728.494469] Call Trace:
[  728.494551]  [<ffffffffa013dc28>] __btrfs_cow_block+0x478/0x530 [btrfs]
[  728.494733]  [<ffffffff8101c809>] ? sched_clock+0x9/0x10
[  728.494881]  [<ffffffff810b3a75>] ? local_clock+0x25/0x30
[  728.495033]  [<ffffffffa013e2de>] btrfs_cow_block+0x11e/0x1d0 [btrfs]
[  728.495230]  [<ffffffffa01407e3>] btrfs_search_slot+0x1f3/0x990 [btrfs]
[  728.495475]  [<ffffffff811d4fcd>] ? __memcg_kmem_get_cache+0x10d/0x290
[  728.495658]  [<ffffffffa01418c6>] btrfs_insert_empty_items+0x76/0xd0 [btrfs]
[  728.495857]  [<ffffffffa019c214>] btrfs_insert_orphan_item+0x64/0x90 [btrfs]
[  728.496053]  [<ffffffffa01678bf>] btrfs_orphan_add+0xef/0x1d0 [btrfs]
[  728.496267]  [<ffffffffa0173219>] btrfs_setattr+0x1f9/0x300 [btrfs]
[  728.496498]  [<ffffffff811fefc2>] notify_change+0x1c2/0x320
[  728.496679]  [<ffffffff811e81a4>] ? flush_old_exec+0x584/0x7d0
[  728.496837]  [<ffffffff811e0473>] do_truncate+0x63/0xa0
[  728.496980]  [<ffffffff811ef7c6>] do_last+0x646/0xec0
[  728.497122]  [<ffffffff811ee9eb>] ? link_path_walk+0x7b/0x810
[  728.497341]  [<ffffffff811f2cf2>] path_openat+0xc2/0x620
[  728.497501]  [<ffffffff810523b7>] ? kvm_clock_read+0x27/0x40
[  728.497654]  [<ffffffff8101c809>] ? sched_clock+0x9/0x10
[  728.497800]  [<ffffffff812002b1>] ? __alloc_fd+0x31/0x160
[  728.497946]  [<ffffffff811f3385>] do_filp_open+0x45/0xa0
[  728.498090]  [<ffffffff81200338>] ? __alloc_fd+0xb8/0x160
[  728.498272]  [<ffffffff811e0bb5>] do_sys_open+0x115/0x230
[  728.498439]  [<ffffffff810d07f5>] ? trace_hardirqs_on_caller+0x105/0x1d0
[  728.498623]  [<ffffffff81355fbe>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[  728.498796]  [<ffffffff811e0d09>] SyS_open+0x19/0x20
[  728.498933]  [<ffffffff816f23b9>] system_call_fastpath+0x16/0x1b
[  728.499094] Code: ff 0f 1f 00 55 48 8b 87 e8 01 00 00 41 89 d0 48 89 f2 48 8b 37 b9 50 00 00 00 48 89 e5 48 89 c7 e8 9f fc ff ff 85 c0 78 02 c9 c3 <0f> 0b 0f 1f 80 00 00 00 00 eb f7 66 66 66 66 66 2e 0f 1f 84 00
[  728.500178] RIP  [<ffffffffa013d497>] tree_mod_log_set_root_pointer+0x27/0x40 [btrfs]
[  728.500457]  RSP <ffff880003dd3898>

>> Ext3 kernel error log
>> =====================
>>
>> 17:20:05 kernel: SLUB: Unable to allocate memory on node -1 (gfp=0x20)
>> 17:20:05 kernel:  cache: ext4_extent_status(2:test), object size: 40, buffer size: 40, default order: 0, min order: 0
>> 17:20:05 kernel:  node 0: slabs: 375, objs: 38250, free: 0
>> 17:20:05 kernel:  node 1: slabs: 128, objs: 13056, free: 0
>> (many times)
> 
> This looks like the kmem limit has been reached and all the further
> allocation fails.

Michal is right. I guess we shouldn't show those warnings when kmalloc
fails due to memcg limit.

> 
>> Btrfs kernel console crash log
>> ==============================
>>
>> BUG: unable to handle kernel paging request at fffffffe36a55230
>> IP: [<ffffffff810f5055>] cpuacct_charge+0x35/0x58
>> PGD 1b5d067 PUD 0
>> Thread overran stack, or stack corrupted
> 
> This is really unexpected. Especially when the stack dumped bellow is
> not a usual suspect. This is a simple interrupt handler which handles
> hrtimer and this shouldn't overflow the stack...

This trace looks mysterious to me either. Usually on overrun we have a
very long call trace, which points to the guilty, but nothing similar
here. Perhaps, the stack page was corrupted by another thread.

> 
>> Oops: 0000 [#1] PREEMPT SMP
>> Modules linked in:
>> CPU: 6 PID: 5729 Comm: dd Not tainted 3.14.0-elastic #1
>> Hardware name: Supermicro H8DMT-IBX/H8DMT-IBX, BIOS 080014  10/17/2009
>> task: ffff88040a6fdac0 ti: ffff8800d69cc000 task.ti: ffff8800d69cc000
>> RIP: 0010:[<ffffffff810f5055>]  [<ffffffff810f5055>] cpuacct_charge+0x35/0x58
>> RSP: 0018:ffff880827d03d88  EFLAGS: 00010002
>> RAX: 000060f7d80032d0 RBX: ffff88040a6fdac0 RCX: ffffffffd69cc148
>> RDX: ffff88081191a180 RSI: 00000000000ebb99 RDI: ffff88040a6fdac0
>> RBP: ffff880827d03da8 R08: 0000000000000000 R09: ffff880827ffc348
>> R10: ffff880827ffc2a0 R11: ffff880827ffc340 R12: ffffffffd69cc148
>> R13: 00000000000ebb99 R14: fffffffffffebb99 R15: ffff88040a6fdac0
>> FS:  00007f508b54e6f0(0000) GS:ffff880827d00000(0000) knlGS:0000000000000000
>> CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>> CR2: fffffffe36a55230 CR3: 000000080e9d2000 CR4: 00000000000007e0
>> Stack:
>>  0000000000000000 ffff88040a6fdac0 ffff880810fe2800 00000000000ebb99
>>  ffff880827d03dd8 ffffffff810ebbb3 ffff88040a6fdb28 ffff880810fe2800
>>  ffff880827d11bc0 0000000000000000 ffff880827d03e28 ffffffff810eeaaf
>> Call Trace:
>>  <IRQ>
>>  [<ffffffff810ebbb3>] update_curr+0xc2/0x11e
>>  [<ffffffff810eeaaf>] task_tick_fair+0x3d/0x631
>>  [<ffffffff810e5bb7>] scheduler_tick+0x57/0xba
>>  [<ffffffff81108eaf>] ? tick_nohz_handler+0xcf/0xcf
>>  [<ffffffff810cb73d>] update_process_times+0x55/0x66
>>  [<ffffffff81108f2b>] tick_sched_timer+0x7c/0x9b
>>  [<ffffffff810dd0d2>] __run_hrtimer+0x57/0xcc
>>  [<ffffffff810dd4c7>] hrtimer_interrupt+0xd0/0x1db
>>  [<ffffffff810e761b>] ? __vtime_account_system+0x2d/0x31
>>  [<ffffffff8105f8c1>] local_apic_timer_interrupt+0x53/0x58
>>  [<ffffffff81060475>] smp_apic_timer_interrupt+0x3e/0x51
>>  [<ffffffff8186299d>] apic_timer_interrupt+0x6d/0x80
>>  <EOI>
>> Code: 54 53 48 89 fb 48 83 ec 08 48 8b 47 08 4c 63 60 18 e8 84 8c 00 00 48 8b 83 a0 06 00 00 4c 89 e1 48 8b 50 48 48 8b 82 80 00 00 00 <48> 03 04 cd f0 47 bf 81 4c 01 28 48 8b 52 40 48 85 d2 75 e5 e8
>> RIP  [<ffffffff810f5055>] cpuacct_charge+0x35/0x58
>>  RSP <ffff880827d03d88>
>> CR2: fffffffe36a55230
>> ---[ end trace b449af50c3a0711c ]---
>> Kernel panic - not syncing: Fatal exception in interrupt
>> Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>