On Wed, Sep 29, 2021 at 04:41:48PM -0700, Song Liu wrote: > The issue is NOT caused by concurrent khugepaged:collapse_file() and > truncate_pagecache(inode, 0). With some printks, we can see a clear > time gap (>2 second ) between collapse_file() finishes, and > truncate_pagecache() (which crashes soon). Therefore, my earlier > suggestion that adds deny_write_access() to collapse_file() does NOT > work. > > The crash is actually caused by concurrent truncate_pagecache(inode, 0). > If I change the number of write thread in stress_madvise_dso.c to one, > (IOW, one thread_read and one thread_write), I cannot reproduce the > crash anymore. > > I think this means we cannot fix this issue in collapse_file(), because it > finishes long before the crash. Ah! So are we missing one or more of these locks: inode_lock(inode); filemap_invalidate_lock(mapping); in the open path?