On Tue, 2022-01-04 at 12:22 +1100, Dave Chinner wrote: > On Tue, Jan 04, 2022 at 12:04:23AM +0000, Trond Myklebust wrote: > > On Tue, 2022-01-04 at 09:03 +1100, Dave Chinner wrote: > > > On Sat, Jan 01, 2022 at 05:39:45PM +0000, Trond Myklebust wrote: > > > > On Sat, 2022-01-01 at 14:55 +1100, Dave Chinner wrote: > > > > > As it is, if you are getting soft lockups in this location, > > > > > that's > > > > > an indication that the ioend chain that is being built by XFS > > > > > is > > > > > way, way too long. IOWs, the completion latency problem is > > > > > caused > > > > > by > > > > > a lack of submit side ioend chain length bounding in > > > > > combination > > > > > with unbound completion side merging in xfs_end_bio - it's > > > > > not a > > > > > problem with the generic iomap code.... > > > > > > > > > > Let's try to address this in the XFS code, rather than hack > > > > > unnecessary band-aids over the problem in the generic code... > > > > > > > > > > Cheers, > > > > > > > > > > Dave. > > > > > > > > Fair enough. As long as someone is working on a solution, then > > > > I'm > > > > happy. Just a couple of things: > > > > > > > > Firstly, we've verified that the cond_resched() in the bio loop > > > > does > > > > suffice to resolve the issue with XFS, which would tend to > > > > confirm > > > > what > > > > you're saying above about the underlying issue being the ioend > > > > chain > > > > length. > > > > > > > > Secondly, note that we've tested this issue with a variety of > > > > older > > > > kernels, including 4.18.x, 5.1.x and 5.15.x, so please bear in > > > > mind > > > > that it would be useful for any fix to be backward portable > > > > through > > > > the > > > > stable mechanism. > > > > > > The infrastructure hasn't changed that much, so whatever the > > > result > > > is it should be backportable. > > > > > > As it is, is there a specific workload that triggers this issue? > > > Or > > > a specific machine config (e.g. large memory, slow storage). Are > > > there large fragmented files in use (e.g. randomly written VM > > > image > > > files)? There are a few factors that can exacerbate the ioend > > > chain > > > lengths, so it would be handy to have some idea of what is > > > actually > > > triggering this behaviour... > > > > > > Cheers, > > > > > > Dave. > > > > We have different reproducers. The common feature appears to be the > > need for a decently fast box with fairly large memory (128GB in one > > case, 400GB in the other). It has been reproduced with HDs, SSDs > > and > > NVME systems. > > > > On the 128GB box, we had it set up with 10+ disks in a JBOD > > configuration and were running the AJA system tests. > > > > On the 400GB box, we were just serially creating large (> 6GB) > > files > > using fio and that was occasionally triggering the issue. However > > doing > > an strace of that workload to disk reproduced the problem faster :- > > ). > > Ok, that matches up with the "lots of logically sequential dirty > data on a single inode in cache" vector that is required to create > really long bio chains on individual ioends. > > Can you try the patch below and see if addresses the issue? > That patch does seem to fix the soft lockups. -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@xxxxxxxxxxxxxxx