bisected bd4c82c22c367e is the first bad commit (was [Bug 198617] New: zswap causing random applications to crash)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,

On (01/30/18 11:48), Andrew Morton wrote:
> Subject: [Bug 198617] New: zswap causing random applications to crash
> 
> https://bugzilla.kernel.org/show_bug.cgi?id=198617
> 
>             Bug ID: 198617
>            Summary: zswap causing random applications to crash
>            Product: Memory Management
>            Version: 2.5
>     Kernel Version: 4.14.15
>           Hardware: x86-64
>                 OS: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: Page Allocator
>           Assignee: akpm@xxxxxxxxxxxxxxxxxxxx
>           Reporter: kernel_org@xxxxxx
>         Regression: No
> 
> https://bugs.freedesktop.org/show_bug.cgi?id=104709
> https://bugs.kde.org/show_bug.cgi?id=389542
> 
> I did have zswap enabled for a long while, and a lot of wine games,
> plasmashell, xorg, kwin_x11 (and other) did crash randomly when reached 100% of
> physical ram and swap was like almost never used.
> 
> I could esilly open a lot of browser tabs and the browser or xorg would fail
> every time.
> 
> After disabling zswap no crashes at all.
> 
> /etc/systemd/swap.conf
> zswap_enabled=1
> zswap_compressor=lz4      # lzo lz4
> zswap_max_pool_percent=25 # 1-99
> zswap_zpool=zbud          # zbud z3fold


So I did a number of tests and I confirm that under memory pressure
with frontswap enabled I do see segfaults and memory corruptions in
random user space applications.

kernel: urxvt[338]: segfault at 20 ip 00007fc08889ae0d sp 00007ffc73a7fc40 error 6 in libc-2.26.so[7fc08881a000+1ae000]
 #0  0x00007fc08889ae0d _int_malloc (libc.so.6)
 #1  0x00007fc08889c2f3 malloc (libc.so.6)
 #2  0x0000560e6004bff7 _Z14rxvt_wcstoutf8PKwi (urxvt)
 #3  0x0000560e6005e75c n/a (urxvt)
 #4  0x0000560e6007d9f1 _ZN16rxvt_perl_interp6invokeEP9rxvt_term9hook_typez (urxvt)
 #5  0x0000560e6003d988 _ZN9rxvt_term9cmd_parseEv (urxvt)
 #6  0x0000560e60042804 _ZN9rxvt_term6pty_cbERN2ev2ioEi (urxvt)
 #7  0x0000560e6005c10f _Z17ev_invoke_pendingv (urxvt)
 #8  0x0000560e6005cb55 ev_run (urxvt)
 #9  0x0000560e6003b9b9 main (urxvt)
 #10 0x00007fc08883af4a __libc_start_main (libc.so.6)
 #11 0x0000560e6003f9da _start (urxvt)

kernel: urxvt[343]: segfault at 10 ip 00007fa56bd7d52b sp 00007ffc09783a40 error 4 in libc-2.26.so[7fa56bcfd000+1ae000]
 #0  0x00007fa56bd7d52b _int_malloc (libc.so.6)
 #1  0x00007fa56bd7f2f3 malloc (libc.so.6)
 #2  0x00007fa56b3d6097 n/a (libxcb.so.1)
 #3  0x00007fa56b3d64d8 n/a (libxcb.so.1)
 #4  0x00007fa56c921b79 n/a (libX11.so.6)
 #5  0x00007fa56c921ceb n/a (libX11.so.6)
 #6  0x00007fa56c921fdd _XEventsQueued (libX11.so.6)
 #7  0x00007fa56c913c49 XEventsQueued (libX11.so.6)
 #8  0x000055b35cfc3262 _ZN12rxvt_display8flush_cbERN2ev7prepareEi (urxvt)
 #9  0x000055b35cfc910f _Z17ev_invoke_pendingv (urxvt)
 #10 0x000055b35cfc9c02 ev_run (urxvt)
 #11 0x000055b35cfa89b9 main (urxvt)
 #12 0x00007fa56bd1df4a __libc_start_main (libc.so.6)
 #13 0x000055b35cfac9da _start (urxvt)

 Stack trace of thread 351:
 #0  0x00007f5baaee7860 raise (libc.so.6)
 #1  0x00007f5baaee8ec9 abort (libc.so.6)
 #2  0x00007f5baaf30849 __malloc_assert (libc.so.6)
 #3  0x00007f5baaf34011 _int_malloc (libc.so.6)
 #4  0x00007f5baaf352f3 malloc (libc.so.6)
 #5  0x00007f5baaf71cad __alloc_dir (libc.so.6)
 #6  0x00007f5baaf71dbd opendir_tail (libc.so.6)
 #7  0x00007f5bab5bbac4 Perl_pp_open_dir (libperl.so)
 #8  0x00007f5bab55fec6 Perl_runops_standard (libperl.so)
 #9  0x00007f5bab4d9390 Perl_call_sv (libperl.so)
 #10 0x00005611f097e190 _ZN16rxvt_perl_interp6invokeEP9rxvt_term9hook_typez (urxvt)
 #11 0x00005611f0947acb _ZN9rxvt_term14init_resourcesEiPKPKc (urxvt)
 #12 0x00005611f0948da8 _ZN9rxvt_term5init2EiPKPKc (urxvt)
 #13 0x00005611f097a0af n/a (urxvt)
 #14 0x00007f5bab568259 Perl_pp_entersub (libperl.so)
 #15 0x00007f5bab55fec6 Perl_runops_standard (libperl.so)
 #16 0x00007f5bab4d9390 Perl_call_sv (libperl.so)
 #17 0x00005611f097e190 _ZN16rxvt_perl_interp6invokeEP9rxvt_term9hook_typez (urxvt)
 #18 0x00005611f0939a77 _ZN9rxvt_term9key_pressER9XKeyEvent (urxvt)
 #19 0x00005611f093d77a _ZN9rxvt_term4x_cbER7_XEvent (urxvt)
 #20 0x00005611f09572e8 _ZN12rxvt_display8flush_cbERN2ev7prepareEi (urxvt)
 #21 0x00005611f095d10f _Z17ev_invoke_pendingv (urxvt)
 #22 0x00005611f095dc02 ev_run (urxvt)
 #23 0x00005611f093c9b9 main (urxvt)
 #24 0x00007f5baaed3f4a __libc_start_main (libc.so.6)
 #25 0x00005611f09409da _start (urxvt)


and so on.


However, the problem is not specific to 4.14.15 or 4.14.11.

I manages to track it down to 4.14 merge window, so we are basically
looking at 4.14-rc0+

The bisect log looks as follows:

git bisect start
# bad: [2bd6bf03f4c1c59381d62c61d03f6cc3fe71f66e] Linux 4.14-rc1
git bisect bad 2bd6bf03f4c1c59381d62c61d03f6cc3fe71f66e
# good: [569dbb88e80deb68974ef6fdd6a13edb9d686261] Linux 4.13
git bisect good 569dbb88e80deb68974ef6fdd6a13edb9d686261
# good: [aae3dbb4776e7916b6cd442d00159bea27a695c1] Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
git bisect good aae3dbb4776e7916b6cd442d00159bea27a695c1
# bad: [2f173d2688559a6f85643d38a2ad6f45eb420c42] KVM: x86: Fix immediate_exit handling for uninitialized AP
git bisect bad 2f173d2688559a6f85643d38a2ad6f45eb420c42
# bad: [d969443064abf2f51510559a5b01325eaabfcb1d] Merge tag 'sound-4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
git bisect bad d969443064abf2f51510559a5b01325eaabfcb1d
# bad: [a0725ab0c7536076d5477264420ef420ebb64501] Merge branch 'for-4.14/block' of git://git.kernel.dk/linux-block
git bisect bad a0725ab0c7536076d5477264420ef420ebb64501
# bad: [f92e3da18b7d5941468040af962c201235148301] Merge branch 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect bad f92e3da18b7d5941468040af962c201235148301
# good: [1c9fe4409ce3e9c78b1ed96ee8ed699d4f03bf33] x86/mm: Document how CR4.PCIDE restore works
git bisect good 1c9fe4409ce3e9c78b1ed96ee8ed699d4f03bf33
# bad: [da99ecf117fce6570bd3989263d68ee0007e1249] mm: replace TIF_MEMDIE checks by tsk_is_oom_victim
git bisect bad da99ecf117fce6570bd3989263d68ee0007e1249
# good: [f7b68046873724129798c405e1a4e326b409c08f] mm: use find_get_pages_range() in filemap_range_has_page()
git bisect good f7b68046873724129798c405e1a4e326b409c08f
# bad: [824f973904a1108806fa0fbe15dc93ee9ecd9e0a] userfaultfd: selftest: enable testing of UFFDIO_ZEROPAGE for shmem
git bisect bad 824f973904a1108806fa0fbe15dc93ee9ecd9e0a
# good: [98cc093cba1e925eb34963dedb5f1684f1bdb2f4] block, THP: make block_device_operations.rw_page support THP
git bisect good 98cc093cba1e925eb34963dedb5f1684f1bdb2f4
# bad: [fe490cc0fe9e6ee48cc48bb5dc463bc5f0f1428f] mm, THP, swap: add THP swapping out fallback counting
git bisect bad fe490cc0fe9e6ee48cc48bb5dc463bc5f0f1428f
# good: [3e14a57b2416b7c94189b95baffd673cf5e0d0a3] memcg, THP, swap: support move mem cgroup charge for THP swapped out
git bisect good 3e14a57b2416b7c94189b95baffd673cf5e0d0a3
# good: [d6810d730022016d9c0f389452b86b035dba1492] memcg, THP, swap: make mem_cgroup_swapout() support THP
git bisect good d6810d730022016d9c0f389452b86b035dba1492
# bad: [bd4c82c22c367e068acb1ec9ec02be2fac3e09e2] mm, THP, swap: delay splitting THP after swapped out
git bisect bad bd4c82c22c367e068acb1ec9ec02be2fac3e09e2
# first bad commit: [bd4c82c22c367e068acb1ec9ec02be2fac3e09e2] mm, THP, swap: delay splitting THP after swapped out


The suspected first bad commit is:

bd4c82c22c367e068acb1ec9ec02be2fac3e09e2 is the first bad commit
commit bd4c82c22c367e068acb1ec9ec02be2fac3e09e2
Author: Huang Ying
Date:   Wed Sep 6 16:22:49 2017 -0700

    mm, THP, swap: delay splitting THP after swapped out
    
    In this patch, splitting transparent huge page (THP) during swapping out
    is delayed from after adding the THP into the swap cache to after
    swapping out finishes.  After the patch, more operations for the
    anonymous THP reclaiming, such as writing the THP to the swap device,
    removing the THP from the swap cache could be batched.  So that the
    performance of anonymous THP swapping out could be improved.
    
    This is the second step for the THP swap support.  The plan is to delay
    splitting the THP step by step and avoid splitting the THP finally.
    
    With the patchset, the swap out throughput improves 42% (from about
    5.81GB/s to about 8.25GB/s) in the vm-scalability swap-w-seq test case
    with 16 processes.  At the same time, the IPI (reflect TLB flushing)
    reduced about 78.9%.  The test is done on a Xeon E5 v3 system.  The swap
    device used is a RAM simulated PMEM (persistent memory) device.  To test
    the sequential swapping out, the test case creates 8 processes, which
    sequentially allocate and write to the anonymous pages until the RAM and
    part of the swap device is used up.
    
    Link: http://lkml.kernel.org/r/20170724051840.2309-12-ying.huang@xxxxxxxxx

	-ss

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>



[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux