Re: 4.8.8 kernel trigger OOM killer repeatedly when I have lots of RAM that should be free

Marc MERLIN <marc@xxxxxxxxxxx> · Wed, 30 Nov 2016 09:47:13 -0800

On Tue, Nov 29, 2016 at 10:01:10AM -0800, Linus Torvalds wrote:
> On Tue, Nov 29, 2016 at 9:40 AM, Marc MERLIN <marc@xxxxxxxxxxx> wrote:
> >
> > In my case, it is a 5x 4TB HDD with
> > software raid 5 < bcache < dmcrypt < btrfs
> 
> It doesn't sound like the nasty situations I have seen (particularly
> with large USB flash storage - often high momentary speed for
> benchmarks, but slows down to a crawl after you've written a bit to
> it, and doesn't have the smart garbage collection that modern "real"
> SSDs have).

I gave it a thought again, I think it is exactly the nasty situation you
described.
bcache takes I/O quickly while sending to SSD cache. SSD fills up, now
bcache can't handle IO as quickly and has to hang until the SSD has been
flushed to spinning rust drives.
This actually is exactly the same as filling up the cache on a USB key
and now you're waiting for slow writes to flash, is it not?

With your dirty ratio workaround, I was able to re-enable bcache and
have it not fall over, but only barely. I recorded over a hundred
workqueues in flight during the copy at some point (just not enough
to actually kill the kernel this time).

I've started a bcache followp on this here:
http://marc.info/?l=linux-bcache&m=148052441423532&w=2
http://marc.info/?l=linux-bcache&m=148052620524162&w=2

This message shows the huge pileup of workqueeues in bcache
just before the kernel dies with
Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013
task: ffff9ee0c2fa4180 task.stack: ffff9ee0c2fa8000
RIP: 0010:[<ffffffffbb57a128>]  [<ffffffffbb57a128>] cpuidle_enter_state+0x119/0x171
RSP: 0000:ffff9ee0c2fabea0  EFLAGS: 00000246
RAX: ffff9ee0de3d90c0 RBX: 0000000000000004 RCX: 000000000000001f
RDX: 0000000000000000 RSI: 0000000000000007 RDI: 0000000000000000
RBP: ffff9ee0c2fabed0 R08: 0000000000000f92 R09: 0000000000000f42
R10: ffff9ee0c2fabe50 R11: 071c71c71c71c71c R12: ffffe047bfdcb200
R13: 00000af626899577 R14: 0000000000000004 R15: 00000af6264cc557
FS:  0000000000000000(0000) GS:ffff9ee0de3c0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000000898b000 CR3: 000000045cc06000 CR4: 00000000001406e0
Stack:
 0000000000000f40 ffffe047bfdcb200 ffffffffbbccc060 ffff9ee0c2fac000
 ffff9ee0c2fa8000 ffff9ee0c2fac000 ffff9ee0c2fabee0 ffffffffbb57a1ac
 ffff9ee0c2fabf30 ffffffffbb09238d ffff9ee0c2fa8000 0000000700000004
Call Trace:
 [<ffffffffbb57a1ac>] cpuidle_enter+0x17/0x19
 [<ffffffffbb09238d>] cpu_startup_entry+0x210/0x28b
 [<ffffffffbb03de22>] start_secondary+0x13e/0x140
Code: 00 00 00 48 c7 c7 cd ae b2 bb c6 05 4b 8e 7a 00 01 e8 17 6c ae ff fa 66 0f 1f 44 00 00 31 ff e8 75 60 b4
44 00 00 <4c> 89 e8 b9 e8 03 00 00 4c 29 f8 48 99 48 f7 f9 ba ff ff ff 7f
Kernel panic - not syncing: Hard LOCKUP

A full traceback showing the pilup of requests is here:
http://marc.info/?l=linux-bcache&m=147949497808483&w=2

and there:
http://pastebin.com/rJ5RKUVm
(2 different ones but mostly the same result)

We can probably follow up on the bcache thread I Cc'ed you on since I'm
not sure if the fault here lies with bcache or the VM subsystem anymore.

Thanks.
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>