Re: xfs umount with i/o error hang/memory corruption

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Apr 4, 2014 at 12:50 PM, Stan Hoeppner <stan@xxxxxxxxxxxxxxxxx> wrote:
On 4/4/2014 1:15 PM, Bob Mastors wrote:
> Greetings,
>
> I am new to xfs and am running into a problem
> and would appreciate any guidance on how to proceed.
>
> After an i/o error from the block device that xfs is using,
> an umount results in a message like:
> [  370.636473] XFS (sdx): Log I/O Error Detected.  Shutting down filesystem
> [  370.644073] XFS (h           ���h"h          ���H#h          ���bsg):
> Please umount the filesystem and rectify the problem(s)
> Note the garbage on the previous line which suggests memory corruption.
> About half the time I get the garbled log message. About half the time
> umount hangs.
>
> And then I get this kind of error and the system is unresponsive:
> Message from syslogd@debian at Apr  4 09:27:40 ...
>  kernel:[  680.080022] BUG: soft lockup - CPU#2 stuck for 22s! [umount:2849]
>
> The problem appears to be similar to this issue:
> http://www.spinics.net/lists/linux-xfs/msg00061.html
>
> I can reproduce the problem easily using open-iscsi to create
> the block device with an iscsi initiator.
> I use lio to create an iscsi target.
>
> The problem is triggered by doing an iscsi logout which causes
> the block device to return i/o errors to xfs.
> Steps to reproduce the problem are below.

This is not a problem but the expected behavior.  XFS is designed to do
this to prevent filesystem corruption.  Logging out of a LUN is no
different than pulling the power plug on a direct attached disk drive.
Surely you would not do that to a running filesystem.
Sorry, I don't think I was clear on the nature of the problem and it's wide ranging effects.
Using iscsi to access block storage on another server is the goal.
A failure on the other server is possible which could result in the iscsi initiator returning i/o errors to xfs.

The behavior that I would like from xfs is to put the filesystem in some kind of offline state
on an i/o error from the block device. Today xfs usually has this desirable behavior.
But there is a corner case were instead xfs hangs the entire server, forcing a hard reboot.

On a large file server with many filesystems using iscsi to talk to block storage on multiple
block servers, I would like the failure of a single block server to only impact the filesystems
that are dependent on the block server, and to not impact the other filesystems.

Bob

> Using VirtualBox, I can reproduce it with two processors but not one.
> I first saw this on a 3.8 kernel and most recently reproduced it with 3.14+.
...

The only problem I see here is that XFS should be shutting down every
time the disk device disappears.  Which means in your test cases where
it does not, your VM environment isn't passing the IO errors up the
stack, and it should be.  Which means your VM environment is broken.

Cheers,

Stan



_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs

[Index of Archives]     [Linux XFS Devel]     [Linux Filesystem Development]     [Filesystem Testing]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux