Re: xfs umount hang in xfs_ail_push_all_sync on i/o error

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, May 2, 2014 at 6:05 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
On Fri, May 02, 2014 at 11:32:19AM -0600, Bob Mastors wrote:
> On Tue, Apr 29, 2014 at 9:22 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
> > Well, I guess I should been more specific. ftrace is pretty much
> > useless here - we need the information taht is recorded by the
> > events, not the timing of the events.
> >
> Heh, I have been trying to bluff my way through this.
> I have hacked on file systems but am new to linux and xfs.

I wouldn't have guessed that. Most of the time the response is
"trace events? what are they?". :)

> Output file attached but it all looks like this:
>
>      xfsaild/sdb-3613 [001] 445.757616: xfs_ail_flushing:     dev 8:16
> lip 0x0xffff88001d3017b8 lsn 32/3573 type XFS_LI_INODE flags IN_AIL
>      xfsaild/sdb-3613  [001]   445.757617: xfs_ail_flushing:     dev 8:16
> lip 0x0xffff88001d301850 lsn 32/3573 type XFS_LI_INODE flags IN_AIL
>      xfsaild/sdb-3613  [001]   445.757618: xfs_ail_flushing:     dev 8:16
> lip 0x0xffff88001de63ed8 lsn 32/3573 type XFS_LI_INODE flags IN_AIL
>      xfsaild/sdb-3613  [001]   445.757618: xfs_ail_flushing:     dev 8:16
> lip 0x0xffff88001cd24720 lsn 32/3573 type XFS_LI_INODE flags IN_AIL
>      xfsaild/sdb-3613  [001]   445.757618: xfs_ail_flushing:     dev 8:16
> lip 0x0xffff88001cd248e8 lsn 32/3573 type XFS_LI_INODE flags IN_AIL
>      xfsaild/sdb-3613  [001]   445.757619: xfs_ail_flushing:     dev 8:16
> lip 0x0xffff88002c435ab0 lsn 32/3573 type XFS_LI_INODE flags IN_AIL

That's very enlightening - a bunch of inodes returning
XFS_ITEM_FLUSHING from attempts to push them.

> The addresses repeat.
> I started the record as soon as I noticed the umount hanging, probably
> within 20 seconds or so.
> Also attached is the console output.
>
> This was done with your patch applied.
>
> Seems like xfsaild_push is missing a check that the file system is
> XFS_FORCED_SHUTDOWN.

It might look that way, but what is supposed to happen is that
everything else is supposed to handle shutdown situations and empty
the AIL. i.e. the act of issuing IO by the AIL is supposed to
trigger shutdown processing on the objects, and so having the AIL
shut down itself won't help.

So, what's the problem? Well, the inodes have been flushed,
triggering the inode buffer to be written. However, then inode
buffer has had an IO error (i.e. the async write error), is retried,
and then we stop retrying the IO. The idea is that the AIL will push
it again soon and the IO will be retried.

What is happening is that the inodes in the buffer are already
marked as "being flushed" and hence the AIL does not push the inode
buffer again, and hence the IO completion callbacks on the buffer
that will remove the inodes from the AIL are never called. Not sure
how to fix this one yet....
Thanks for the explanation.
It clears up some questions I had.
 

> Let me know if there is any other info I can capture that would be
> interesting.

I know what the problem is now, so testing a fix will be the next
step. But I need to think about how to fix this, and that can wait
till Monday. ;)
Just an fyi next week is busy for me with some other stuff so I
won't be able to try out a fix until the following week.
 

Thanks for following up with the trace information, Bob!
Not a problem.

Thanks,
Bob
 

Cheers,

Dave.
--
Dave Chinner
david@xxxxxxxxxxxxx

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs

[Index of Archives]     [Linux XFS Devel]     [Linux Filesystem Development]     [Filesystem Testing]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux