On Sat, Apr 08, 2017 at 07:26:14AM -0400, David Shaw wrote: > On Apr 7, 2017, at 1:39 PM, Brian Foster <bfoster@xxxxxxxxxx> wrote: > > > > On Fri, Apr 07, 2017 at 12:56:48PM -0400, David Shaw wrote: > >> On Apr 7, 2017, at 10:56 AM, Emmanuel Florac <eflorac@xxxxxxxxxxxxxx> wrote: > >>> > >>> Le Thu, 6 Apr 2017 14:41:45 -0400 > >>> David Shaw <dshaw@xxxxxxxxxxxxxxx> écrivait: > >>> > >>>> I'm having a problem with processes getting "stuck" when accessing an > >>>> XFS (v5) filesystem. When it happens, I start getting the "blocked > >>>> for more than 120 seconds" error, and the process stays in that state > >>>> until I reboot. The kernel is 3.10.0-514.2.2 and the xfsprogs is > >>>> 4.5.0-9 (both Centos 7.3). > >>>> > >>> > >>> Could it be that your system is under high IO load? > >> > >> It's possible that the problem is instigated or made worse by high load, but once the processes enter D state, they stay there even when the system is idle. They stay in D state until I reboot. > >> > > > > There isn't enough information provided to suggest the filesystem is > > locked up as opposed to waiting for (very slow) I/O, as suggested by > > Emmanuel. > > Ah, I didn't understand, thanks. > > > If the filesystem appears to be deadlocked, can you provide the complete > > hung task output (echo w > /proc/sysrq-trigger) as well any activity > > that might be shown by tracepoints if enabled when in this state > > (trace-cmd start -e xfs:*; cat /sys/kernel/debug/tracing/trace_pipe)? > > I have the hung task output from a recent time this happened (http://www.jabberwocky.com/xfs/blocked.txt). In it, "servxfs" is the process that reads from the XFS filesystem. It's the process feeding a fuse filesystem. "mmon" and "smbd" are both trying to access the fuse filesystem. I suspect that they're blocking because the servxfs threads are blocking and thus not fulfilling the fuse requests. > Interesting, nothing obvious stands out to me. It looks you have smbd waiting on a pread over fuse and mmon waiting on a stat. Presumably the smbd pread corresponds to the blocked servxfs pread and the stat to one of the servxfs getxattr calls. Any idea where the other servxfs getxattr comes in? > I will get the xfs traces the next time it happens. It seems to be happening 2-3 times a week, but frustratingly, I can't make it happen on demand. Is there something I should look for in particular in the trace output, or some amount of time to capture it for? > Can you elaborate on the resulting behavior? Are these same processes always involved? Can you identify whether they are attempting to access the same file(s) or not? Also, does the underlying filesystem continue to function outside of these processes seemingly all blocked on reads? E.g., can you read another file from the XFS fs or is all I/O blocked? Because there are multiple layers involved here, presumably with custom code in between (e.g., your fuse userspace), this might be easier to reason about if you can dig more into what's blocked in the upper layers to describe precisely what high level requests are active at the XFS level. Brian > Thanks again for your help, > > David > > -- > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html