Hi Dave / Others, It appears upgrading to 4.17+ has indeed fixed the deadlock issue, or at least no deadlocks are occurring now. There are segfaults in xfs_db appearing now though. I am attempting to get the full syslog, here is an example.... thoughts? [Thu Nov 21 10:43:20 2019] xfs_db[13076]: segfault at 12ff6001 ip 0000000000407922 sp 00007ffe1a27b2e0 error 4 in xfs_db[400000+8a000] [Thu Nov 21 10:43:20 2019] Code: 89 cc 55 48 89 d5 53 48 89 f3 48 83 ec 48 0f b6 57 01 44 0f b6 4f 02 64 48 8b 04 25 28 00 00 00 48 89 44 24 38 31 c0 0f b6 07 <44> 0f b6 57 0d 48 8d 74 24 10 c1 e2 10 41 c1 e1 08 c1 e0 18 41 c1 Thanks so much in advance! Andrew On Wed, Nov 20, 2019 at 10:43 AM Andrew Carr <andrewlanecarr@xxxxxxxxx> wrote: > > Genius Dave, Thanks so much! > > On Tue, Nov 19, 2019 at 3:21 PM Dave Chinner <david@xxxxxxxxxxxxx> wrote: > > > > On Tue, Nov 19, 2019 at 10:49:56AM -0500, Andrew Carr wrote: > > > Dave / Eric / Others, > > > > > > Syslog: https://pastebin.com/QYQYpPFY > > > > > > Dmesg: https://pastebin.com/MdBCPmp9 > > > > which shows no stack traces, again. > > > > > > > > Anyway, you've twiddled mkfs knobs on these filesystems, and that > > is the likely cause of the issue: the filesystem is using 64k > > directory blocks - the allocation size is larger than 64kB: > > > > [Sun Nov 17 21:40:05 2019] XFS: nginx(31293) possible memory allocation deadlock size 65728 in kmem_alloc (mode:0x250) > > > > Upstream fixed this some time ago: > > > > $ ▶ gl -n 1 -p cb0a8d23024e > > commit cb0a8d23024e7bd234dea4d0fc5c4902a8dda766 > > Author: Dave Chinner <dchinner@xxxxxxxxxx> > > Date: Tue Mar 6 17:03:28 2018 -0800 > > > > xfs: fall back to vmalloc when allocation log vector buffers > > > > When using large directory blocks, we regularly see memory > > allocations of >64k being made for the shadow log vector buffer. > > When we are under memory pressure, kmalloc() may not be able to find > > contiguous memory chunks large enough to satisfy these allocations > > easily, and if memory is fragmented we can potentially stall here. > > > > TO avoid this problem, switch the log vector buffer allocation to > > use kmem_alloc_large(). This will allow failed allocations to fall > > back to vmalloc and so remove the dependency on large contiguous > > regions of memory being available. This should prevent slowdowns > > and potential stalls when memory is low and/or fragmented. > > > > Signed-Off-By: Dave Chinner <dchinner@xxxxxxxxxx> > > Reviewed-by: Darrick J. Wong <darrick.wong@xxxxxxxxxx> > > Signed-off-by: Darrick J. Wong <darrick.wong@xxxxxxxxxx> > > > > > > Cheers, > > > > Dave. > > -- > > Dave Chinner > > david@xxxxxxxxxxxxx > > > > -- > With Regards, > Andrew Carr > > e. andrewlanecarr@xxxxxxxxx > w. andrew.carr@xxxxxxxxxxxxx > c. 4239489206 > a. P.O. Box 1231, Greeneville, TN, 37744 -- With Regards, Andrew Carr e. andrewlanecarr@xxxxxxxxx w. andrew.carr@xxxxxxxxxxxxx c. 4239489206 a. P.O. Box 1231, Greeneville, TN, 37744