On Mon, Oct 28, 2019 at 08:30:03AM +0100, Christoph Hellwig wrote: > On Sun, Oct 27, 2019 at 11:32:32AM -0700, Darrick J. Wong wrote: > > On Fri, Oct 25, 2019 at 12:24:04PM +0200, Christoph Hellwig wrote: > > > Hi Darrick, > > > > > > the current xfs tree seems to easily cause sotlockups in generic/175 > > > (and a few other tests, but not as reproducible) for me. This is on > > > 20GB 4k block size images on a VM with 4 CPUs and 4G of RAM. > > > > Hrm. I haven't seen that before... what's your kernel config? > > This looks like some kind of lockup in slub debugging...? > > > > Also, is this a new thing? Or something that used to happen with low > > frequency but has slowly increased to the point that it's annoying? > > > > (Or something else?) > > Seems to happen with 5.3 as well. I only recently turned > CONFIG_XFS_ONLINE_SCRUB back on in my usual test config, that is what > made it show up.. > > .config attached. Aha, you have preempt disabled and slub debugging on by default, which (on the million-extent files produced by generic/175) mean that scrub takes long enough to trip the soft lockup watchdog while checking the bmap. The test eventually finishes, but the obvious(ly stupid) bandaid of calling touch_softlockup_watchdog merely plunged the VM into "rcu_sched self-detected stall on CPU" messages and as it's late I'll set it aside until tomorrow. IOWs I think I know what's going on but don't yet know how to fix it. :/ --D