On Wed, 15 Apr 2020, Hugh Dickins wrote: > On Wed, 15 Apr 2020, Yang Shi wrote: > > On Wed, Apr 15, 2020 at 7:04 PM Hugh Dickins <hughd@xxxxxxxxxx> wrote: > > > On Mon, 13 Apr 2020, Yang Shi wrote: > > > > > > > > It looks shmem_uncharge() is just called by __split_huge_page() and > > > > collapse_file(). The collapse_file() has acquired xa_lock with irq > > > > disabled before acquiring info->lock, so it is safe. > > > > __split_huge_page() is called with holding xa_lock with irq enabled, > > > > but lru_lock is acquired with irq disabled before acquiring xa_lock. > > > > > > > > So, it is unnecessary to acquire info->lock with irq disabled in > > > > shmem_uncharge(). Can syzbot try the below patch? > > > > > > But I disagree with the patch below. You're right that IRQ-disabling > > > here is unnecessary, given its two callers; but I'm not sure that we > > > want it to look different from shmem_charge() and all other info->lock > > > takers; and, more importantly, I don't see how removing the redundant > > > IRQ-saving below could make it any less liable to deadlock. > > > > Yes, I realized the patch can't suppress the lockdep splat. But, > > actually I didn't understand how this deadlock could happen because > > info_lock is acquired with IRQ disabled before acquiring > > user_shm_lock. So, interrupt can't come in at all if I didn't miss > > anything. > > I think the story it's trying to tell is this (but, like most of us, > I do find Mr Lockdep embarrassingly difficult to understand; and I'm > not much good at drawing race diagrams either): > > CPU0 was in user_shm_unlock(), it's got shmlock_user_lock, then an > interrupt comes in. It's an endio kind of interrupt, which goes off > to test_clear_page_writeback(), which wants the xa_lock on i_pages. > > Meanwhile, CPU1 was doing some SysV SHM locking, it's got as far as > shmem_lock(), it has acquired info->lock, and goes off to user_shm_lock() > which wants shmlock_user_lock. > > But sadly, CPU2 is splitting a shmem THP, calling shmem_uncharge() > that wants info->lock while outer level holds xa_lock on i_pages: > with interrupts properly disabled, but that doesn't help. > > Now, that story doesn't quite hold up as a deadlock, because shmem > doesn't use writeback tags; and (unless you set shmem_enabled "force") > I don't think there's a way to get shmem THPs in SysV SHM (and are > they hole-punchable? maybe through MADV_REMOVE); so it looks like > we're talking about different inodes. > > But lockdep is right to report it, and more thought might arrive at > a more convincing scenario. Anyway, easily fixed and best fixed. > > (But now I think my patch must wait until tomorrow.) https://lore.kernel.org/lkml/alpine.LSU.2.11.2004161707410.16322@eggly.anvils/