On Wed, May 21, 2014 at 12:30 PM, John Blackwood <john.blackwood@xxxxxxxx> wrote: >> Date: Wed, 21 May 2014 03:33:49 -0400 >> From: Richard Weinberger <richard.weinberger@xxxxxxxxx> >> To: Austin Schuh <austin@xxxxxxxxxxxxxxxx> >> CC: LKML <linux-kernel@xxxxxxxxxxxxxxx>, xfs <xfs@xxxxxxxxxxx>, rt-users >> <linux-rt-users@xxxxxxxxxxxxxxx> >> Subject: Re: Filesystem lockup with CONFIG_PREEMPT_RT > >> >> CC'ing RT folks >> >> On Wed, May 21, 2014 at 8:23 AM, Austin Schuh <austin@xxxxxxxxxxxxxxxx> >> wrote: >> > > On Tue, May 13, 2014 at 7:29 PM, Austin Schuh >> > > <austin@xxxxxxxxxxxxxxxx> wrote: >> >> >> Hi, >> >> >> >> >> >> I am observing a filesystem lockup with XFS on a CONFIG_PREEMPT_RT >> >> >> patched kernel. I have currently only triggered it using dpkg. >> >> >> Dave >> >> >> Chinner on the XFS mailing list suggested that it was a rt-kernel >> >> >> workqueue issue as opposed to a XFS problem after looking at the >> >> >> kernel messages. >> >> >> >> >> >> The only modification to the kernel besides the RT patch is that I >> >> >> have applied tglx's "genirq: Sanitize spurious interrupt detection >> >> >> of >> >> >> threaded irqs" patch. >> > > >> > > I upgraded to 3.14.3-rt4, and the problem still persists. >> > > >> > > I turned on event tracing and tracked it down further. I'm able to >> > > lock it up by scping a new kernel debian package to /tmp/ on the >> > > machine. scp is locking the inode, and then scheduling >> > > xfs_bmapi_allocate_worker in the work queue. The work then never gets >> > > run. The kworkers then lock up waiting for the inode lock. >> > > >> > > Here are the relevant events from the trace. ffff8803e9f10288 >> > > (blk_delay_work) gets run later on in the trace, but ffff8803b4c158d0 >> > > (xfs_bmapi_allocate_worker) never does. The kernel then warns about >> > > blocked tasks 120 seconds later. > > Austin and Richard, > > I'm not 100% sure that the patch below will fix your problem, but we > saw something that sounds pretty familiar to your issue involving the > nvidia driver and the preempt-rt patch. The nvidia driver uses the > completion support to create their own driver's notion of an internally > used semaphore. > > Some tasks were failing to ever wakeup from wait_for_completion() calls > due to a race in the underlying do_wait_for_common() routine. Hi John, Thanks for the suggestion and patch. The issue is that the work never gets run, not that the work finishes but the waiter never gets woken. I applied it anyways to see if it helps, but I still get the lockup. Thanks, Austin _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs