Re: Processes stuck in D state when accessing XFSv5 filesystem

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, Apr 08, 2017 at 07:26:14AM -0400, David Shaw wrote:
> On Apr 7, 2017, at 1:39 PM, Brian Foster <bfoster@xxxxxxxxxx> wrote:
> > 
> > On Fri, Apr 07, 2017 at 12:56:48PM -0400, David Shaw wrote:
> >> On Apr 7, 2017, at 10:56 AM, Emmanuel Florac <eflorac@xxxxxxxxxxxxxx> wrote:
> >>> 
> >>> Le Thu, 6 Apr 2017 14:41:45 -0400
> >>> David Shaw <dshaw@xxxxxxxxxxxxxxx> écrivait:
> >>> 
> >>>> I'm having a problem with processes getting "stuck" when accessing an
> >>>> XFS (v5) filesystem.  When it happens, I start getting the "blocked
> >>>> for more than 120 seconds" error, and the process stays in that state
> >>>> until I reboot.  The kernel is 3.10.0-514.2.2 and the xfsprogs is
> >>>> 4.5.0-9 (both Centos 7.3).
> >>>> 
> >>> 
> >>> Could it be that your system is under high IO load?
> >> 
> >> It's possible that the problem is instigated or made worse by high load, but once the processes enter D state, they stay there even when the system is idle.  They stay in D state until I reboot.
> >> 
> > 
> > There isn't enough information provided to suggest the filesystem is
> > locked up as opposed to waiting for (very slow) I/O, as suggested by
> > Emmanuel.
> 
> Ah, I didn't understand, thanks.
> 
> > If the filesystem appears to be deadlocked, can you provide the complete
> > hung task output (echo w > /proc/sysrq-trigger) as well any activity
> > that might be shown by tracepoints if enabled when in this state
> > (trace-cmd start -e xfs:*; cat /sys/kernel/debug/tracing/trace_pipe)?
> 
> I have the hung task output from a recent time this happened (http://www.jabberwocky.com/xfs/blocked.txt).  In it, "servxfs" is the process that reads from the XFS filesystem.  It's the process feeding a fuse filesystem.  "mmon" and "smbd" are both trying to access the fuse filesystem.  I suspect that they're blocking because the servxfs threads are blocking and thus not fulfilling the fuse requests.
> 

Interesting, nothing obvious stands out to me. It looks you have smbd
waiting on a pread over fuse and mmon waiting on a stat. Presumably the
smbd pread corresponds to the blocked servxfs pread and the stat to one
of the servxfs getxattr calls.

Any idea where the other servxfs getxattr comes in?

> I will get the xfs traces the next time it happens.  It seems to be happening 2-3 times a week, but frustratingly, I can't make it happen on demand.  Is there something I should look for in particular in the trace output, or some amount of time to capture it for?
> 

Can you elaborate on the resulting behavior? Are these same processes
always involved? Can you identify whether they are attempting to access
the same file(s) or not? Also, does the underlying filesystem continue
to function outside of these processes seemingly all blocked on reads?
E.g., can you read another file from the XFS fs or is all I/O blocked?

Because there are multiple layers involved here, presumably with custom
code in between (e.g., your fuse userspace), this might be easier to
reason about if you can dig more into what's blocked in the upper layers
to describe precisely what high level requests are active at the XFS
level.

Brian

> Thanks again for your help,
> 
> David
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux