Re: 4.6.2 frequent crashes under memory + IO pressure

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Johannes Stezenbach wrote:
> What is your opinion about older kernels (4.4, 4.5) working?
> I think I've seen some OOM messages with the older kernels,
> Jill was killed and I restarted the build to complete it.
> A full bisect would take more than a day, I don't think
> I have the time for it.
> Since I use dm-crypt + lvm, should we add more Cc or do
> you think it is an mm issue?

I have no idea.

> > > Below I'm pasting some log snippets, let me know if you like
> > > it so much you want more of it ;-/  The total log is about 1.7MB.
> > 
> > Yes, I'd like to browse it. Could you send it to me?
> 
> Did you get any additional insights from it?

I found

[ 2245.660712] DMA free:4kB min:32kB
[ 2245.707031] DMA32 free:0kB min:6724kB
[ 2245.757597] Normal free:24kB min:928kB
[ 2245.806515] DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
[ 2245.816359] DMA32: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
[ 2245.826378] Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB

[ 2317.853951] DMA free:0kB min:32kB
[ 2317.900460] DMA32 free:0kB min:6724kB
[ 2317.951574] Normal free:0kB min:928kB
[ 2318.000808] DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
[ 2318.010713] DMA32: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
[ 2318.020767] Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB

which completely depleted memory reserves. So, please try commit 78ebc2f7146156f4
("mm,writeback: don't use memory reserves for wb_start_writeback") on your 4.6.2
kernel. As far as I know, passing mem=4G option will do equivalent thing.

Since you think you saw OOM messages with the older kernels, I assume that the OOM
killer was invoked on your 4.6.2 kernel. The OOM reaper in Linux 4.6 and Linux 4.7
will not help if the OOM killed process was between down_write(&mm->mmap_sem) and
up_write(&mm->mmap_sem).

I was not able to confirm whether the OOM killed process (I guess it was java)
was holding mm->mmap_sem for write, for /proc/sys/kernel/hung_task_warnings
dropped to 0 before traces of java threads are printed or console became
unusable due to the "delayed: kcryptd_crypt, ..." line. Anyway, I think that
kmallocwd will report it.

> > It is sad that we haven't merged kmallocwd which will report
> > which memory allocations are stalling
> >  ( http://lkml.kernel.org/r/1462630604-23410-1-git-send-email-penguin-kernel@xxxxxxxxxxxxxxxxxxx ).
> 
> Would you like me to try it?  It wouldn't prevent the hang, though,
> just print better debug ouptut to serial console, right?
> Or would it OOM kill some process?

Yes, but for bisection purpose, please try commit 78ebc2f7146156f4 without
applying kmallocwd. If that commit helps avoiding flood of the allocation
failure warnings, we can consider backporting it. If that commit does not
help, I think you are reporting a new location which we should not use
memory reserves.

kmallocwd will not OOM kill some process. kmallocwd will not prevent the hang.
kmallocwd just prints information of threads which are stalling inside memory
allocation request.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>



[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]