On Tue, Oct 22, 2019 at 03:24:22PM -0700, Andrew Morton wrote: > > (switched to email. Please respond via emailed reply-to-all, not via the > bugzilla web interface). > > On Tue, 22 Oct 2019 09:02:22 +0000 bugzilla-daemon@xxxxxxxxxxxxxxxxxxx wrote: > > > https://bugzilla.kernel.org/show_bug.cgi?id=205135 > > > > --- Comment #7 from goodmirek@xxxxxxxxxxxxx --- > > Everyone who uses a swapfile on XFS filesystem seem affected by this hang up. > > Not sure about other filesystems, I did not have a chance to test it elsewhere. > > > > This unreproduced bot crash could be related: > > https://lore.kernel.org/linux-mm/20190910071804.2944-1-hdanton@xxxxxxxx/ > > Thanks. Might be core MM, might be XFS, might be Fedora. > > Hilf, does your patch look related? That seems to have gone quiet? > > Should we progress Tetsuo's patch? Hmm... Oct 09 15:44:52 kernel: Linux version 5.4.0-0.rc1.git1.1.fc32.x86_64 (mockbuild@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx) (gcc version 9.2.1 20190827 (Red Hat 9.2.1-1) (GCC)) #1 SMP Fri Oct 4 14:57:23 UTC 2019 ...istr 5.4-rc1 had some writeback bugs in it... -> #1 (fs_reclaim){+.+.}: Oct 09 13:47:08 kernel: fs_reclaim_acquire.part.0+0x25/0x30 Oct 09 13:47:08 kernel: __kmalloc+0x4f/0x330 Oct 09 13:47:08 kernel: kmem_alloc+0x83/0x1a0 [xfs] Oct 09 13:47:08 kernel: kmem_alloc_large+0x3c/0x100 [xfs] Oct 09 13:47:08 kernel: xfs_attr_copy_value+0x5d/0xa0 [xfs] Oct 09 13:47:08 kernel: xfs_attr_get+0xe7/0x1d0 [xfs] Oct 09 13:47:08 kernel: xfs_get_acl+0xad/0x1e0 [xfs] Oct 09 13:47:08 kernel: get_acl+0x81/0x110 Oct 09 13:47:08 kernel: posix_acl_create+0x58/0x160 Oct 09 13:47:08 kernel: xfs_generic_create+0x7e/0x2f0 [xfs] Oct 09 13:47:08 kernel: lookup_open+0x5bd/0x820 Oct 09 13:47:08 kernel: path_openat+0x340/0xcb0 Oct 09 13:47:08 kernel: do_filp_open+0x91/0x100 Oct 09 13:47:08 kernel: do_sys_open+0x184/0x220 Oct 09 13:47:08 kernel: do_syscall_64+0x5c/0xa0 Oct 09 13:47:08 kernel: entry_SYSCALL_64_after_hwframe+0x49/0xbe That's XFS trying to allocate memory to load an acl off disk, only it looks this thread does a MAYFAIL allocation. It's a GFP_FS (since we don't set KM_NOFS) allocation so we recurse into fs reclaim, and the ACL-getter has locked the inode (which is probably why lockdep triggers). I wonder if that's really a deadlock vs. just super-slow behavior, but otoh I don't think we're supposed to allow reclaim to jump into the filesystems when the fs has locks held. That kmem_alloc_large should probably be changed to KM_NOFS. Dave? --D