> On Oct 28, 2019, at 1:14 PM, Johannes Weiner <hannes@xxxxxxxxxxx> wrote: > > On Mon, Oct 28, 2019 at 12:52:09PM -0700, syzbot wrote: >> Hello, >> >> syzbot found the following crash on: >> >> HEAD commit: 12d61c69 Add linux-next specific files for 20191024 >> git tree: linux-next >> console output: https://urldefense.proofpoint.com/v2/url?u=https-3A__syzkaller.appspot.com_x_log.txt-3Fx-3D15a0fa97600000&d=DwIBAg&c=5VD0RTtNlTh3ycd41b3MUw&r=dR8692q0_uaizy0jkrBJQM5k2hfm4CiFxYT8KaysFrg&m=YEaOe5RP2hLXAC4tKPLehAQsea0_3k3tI4DL32BcA-8&s=6-TXLGQxJcK1GdwMwa51423Y221rRncNiC_T09O0OLc&e= >> kernel config: https://urldefense.proofpoint.com/v2/url?u=https-3A__syzkaller.appspot.com_x_.config-3Fx-3Dafb75fd8c9fd5ed8&d=DwIBAg&c=5VD0RTtNlTh3ycd41b3MUw&r=dR8692q0_uaizy0jkrBJQM5k2hfm4CiFxYT8KaysFrg&m=YEaOe5RP2hLXAC4tKPLehAQsea0_3k3tI4DL32BcA-8&s=GuFgLJZOb7jtjZ5mDbkVT_zqtiVW4Py13e6Oq5CFxgY&e= >> dashboard link: https://urldefense.proofpoint.com/v2/url?u=https-3A__syzkaller.appspot.com_bug-3Fextid-3Defb9e48b9fbdc49bb34a&d=DwIBAg&c=5VD0RTtNlTh3ycd41b3MUw&r=dR8692q0_uaizy0jkrBJQM5k2hfm4CiFxYT8KaysFrg&m=YEaOe5RP2hLXAC4tKPLehAQsea0_3k3tI4DL32BcA-8&s=pF1hv-zGR8F378weGq9zxCE5ibI2_73qweMB_KuaZLM&e= >> compiler: gcc (GCC) 9.0.0 20181231 (experimental) >> syz repro: https://urldefense.proofpoint.com/v2/url?u=https-3A__syzkaller.appspot.com_x_repro.syz-3Fx-3D13a63dc4e00000&d=DwIBAg&c=5VD0RTtNlTh3ycd41b3MUw&r=dR8692q0_uaizy0jkrBJQM5k2hfm4CiFxYT8KaysFrg&m=YEaOe5RP2hLXAC4tKPLehAQsea0_3k3tI4DL32BcA-8&s=mI7ZOgrDWeG-p6vn2d_kj65a5g8J7exXJ2MIUUF84-w&e= >> >> The bug was bisected to: >> >> commit 9c61acffe2b8833152041f7b6a02d1d0a17fd378 >> Author: Song Liu <songliubraving@xxxxxx> >> Date: Wed Oct 23 00:24:28 2019 +0000 >> >> mm,thp: recheck each page before collapsing file THP >> >> bisection log: https://urldefense.proofpoint.com/v2/url?u=https-3A__syzkaller.appspot.com_x_bisect.txt-3Fx-3D13eb6ec0e00000&d=DwIBAg&c=5VD0RTtNlTh3ycd41b3MUw&r=dR8692q0_uaizy0jkrBJQM5k2hfm4CiFxYT8KaysFrg&m=YEaOe5RP2hLXAC4tKPLehAQsea0_3k3tI4DL32BcA-8&s=YtSUy5Dtjo6tek7CvwzMTPL40BJwOC6rEom-AkVx0SM&e= >> final crash: https://urldefense.proofpoint.com/v2/url?u=https-3A__syzkaller.appspot.com_x_report.txt-3Fx-3D101b6ec0e00000&d=DwIBAg&c=5VD0RTtNlTh3ycd41b3MUw&r=dR8692q0_uaizy0jkrBJQM5k2hfm4CiFxYT8KaysFrg&m=YEaOe5RP2hLXAC4tKPLehAQsea0_3k3tI4DL32BcA-8&s=BvPJx3QSPHgsN12jSZci_MqW_VxYp-MZpQtogZjlJOo&e= >> console output: https://urldefense.proofpoint.com/v2/url?u=https-3A__syzkaller.appspot.com_x_log.txt-3Fx-3D17eb6ec0e00000&d=DwIBAg&c=5VD0RTtNlTh3ycd41b3MUw&r=dR8692q0_uaizy0jkrBJQM5k2hfm4CiFxYT8KaysFrg&m=YEaOe5RP2hLXAC4tKPLehAQsea0_3k3tI4DL32BcA-8&s=YPvxWpQDpk9MI9W6QCtxME64wmxL2CZ5ZtEkCn0nI0c&e= >> >> IMPORTANT: if you fix the bug, please add the following tag to the commit: >> Reported-by: syzbot+efb9e48b9fbdc49bb34a@xxxxxxxxxxxxxxxxxxxxxxxxx >> Fixes: 9c61acffe2b8 ("mm,thp: recheck each page before collapsing file THP") >> >> INFO: task khugepaged:1084 blocked for more than 143 seconds. >> Not tainted 5.4.0-rc4-next-20191024 #0 >> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. >> khugepaged D27568 1084 2 0x80004000 >> Call Trace: >> context_switch kernel/sched/core.c:3384 [inline] >> __schedule+0x94a/0x1e70 kernel/sched/core.c:4069 >> schedule+0xd9/0x260 kernel/sched/core.c:4136 >> io_schedule+0x1c/0x70 kernel/sched/core.c:5780 >> wait_on_page_bit_common mm/filemap.c:1175 [inline] >> __lock_page+0x422/0xab0 mm/filemap.c:1383 >> lock_page include/linux/pagemap.h:480 [inline] >> mpage_prepare_extent_to_map+0xb3f/0xf90 fs/ext4/inode.c:2668 >> ext4_writepages+0xb6a/0x2e70 fs/ext4/inode.c:2866 >> ? 0xffffffff81000000 >> do_writepages+0xfa/0x2a0 mm/page-writeback.c:2344 >> __filemap_fdatawrite_range+0x2bc/0x3b0 mm/filemap.c:421 >> __filemap_fdatawrite mm/filemap.c:429 [inline] >> filemap_flush+0x24/0x30 mm/filemap.c:456 > > This is a double locking deadlock. The page lock is already held when > we call into filemap_flush() here, and does another lock_page() in > write_cache_pages(). > > To fix it, we have to either initiate flushing before acquiring the > page lock, or simply skip over dirty pages. > > Maybe doing vfs_fsync_range() from the madvise(HUGEPAGE) call isn't a > bad idea after all? (I had discussed this with Song off-list before.) Thanks syzbot and Johannes! I just sent a quick fix, that just removes filemap_flush(). I will work on a better mechanism to flush the file. Thanks, Song