Re: XFS filesystem on EC2 instance corrupts and shuts down

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 3/13/13 1:07 PM, Shrinath M wrote:
> Sorry to be asking in dev thread, but Amazon seems to be clueless in this case :(
> Can someone tell me where can we find the logs/output of xfs repair
> after this runs? We just reboot the machine when we see this and the
> /var/log/messages or dmesg seems to know nothing about what it
> repaired.

xfs_repair does not run automatically at boot on any OS I know of; xfs simply
replays the log.  But then I don't know what OS you are running, looks like
an amazon special?  It's a pity they can't support the OS they provide you,
because on an older kernel like this, upstream developers will be less
interested unless the problem persists in upstream kernels.  This sort
of support is usually best left to an OS vendor.

But all that aside, you list this as the first error:

    Mar  5 01:14:33 ip-100-0-100-1 kernel: [14139930.248619] XFS (md0): Corruption detected. Unmount and run xfs_repair

but I am wondering if there might be more information before this which is not in your trimmed logs.

The text above is from xfs_corruption_error() which calls xfs_error_report() before
the above message, and which should normally tell us a lot more about what went wrong, for 
example something like "Internal error %s at line %d of file %s.  Caller 0x%"
and possibly a hexdump or stack trace.

One of the things in
http://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F
is:

" dmesg output showing all error messages and stack traces "

If you really didn't get anything else before this, try:

echo 11 > /proc/sys/fs/xfs/error_level

to capture the one instance where a corruption does not trigger verbose logs. That actually might be what you hit.

It's a little odd that you get:

Feb 12 19:47:18 ip-100-0-100-1 kernel: [2541168.014259] XFS (md0): xfs_iunlink_remove: xfs_itobp() returned error 117.

because AFAIK, 117 is not any known error number (not even xfs's old EFSCORRUPTED value, which was 990)
But I see other references in various places to this error number coming from XFS - so I'm not sure.

-Eric

> 
> On Wed, Mar 6, 2013 at 7:55 PM, Ric Wheeler <rwheeler@xxxxxxxxxx <mailto:rwheeler@xxxxxxxxxx>> wrote:
> 
>     I would suggest contacting Amazon's customer support channel (or the vendor you paid for the Linux instance you are running).
> 
>     XFS developer list is probably not the correct forum to help you debug this :)
> 
>     Good luck!
> 
>     Ric
> 
> 
> 
>     On 03/06/2013 08:12 AM, Supratik Goswami wrote:
> 
>         Have we created a ticket with AWS ?
> 
>         It could be an EBS issue who knows, we need to confirm that first.
> 
>         --
>         Warm Regards
> 
>         Supratik
> 
> 
>         On Wed, Mar 6, 2013 at 6:38 PM, Ric Wheeler <rwheeler@xxxxxxxxxx <mailto:rwheeler@xxxxxxxxxx> <mailto:rwheeler@xxxxxxxxxx <mailto:rwheeler@xxxxxxxxxx>>> wrote:
> 
>             On 03/06/2013 08:03 AM, Shrinath M wrote:
> 
> 
>                 On Wed, Mar 6, 2013 at 6:29 PM, Ric Wheeler <rwheeler@xxxxxxxxxx <mailto:rwheeler@xxxxxxxxxx>
>                 <mailto:rwheeler@xxxxxxxxxx <mailto:rwheeler@xxxxxxxxxx>> <mailto:rwheeler@xxxxxxxxxx <mailto:rwheeler@xxxxxxxxxx>
> 
>                 <mailto:rwheeler@xxxxxxxxxx <mailto:rwheeler@xxxxxxxxxx>>>> wrote:
> 
>                     I think that you would need to verify that the Amazon storage is not
>                     throwing errors - do your logs show IO errors or issues before XFS
>                 hits an
>                     issue?
> 
> 
>                 No IO errors in /var/log/messages.
>                 Where else should I be looking?
> 
> 
> 
>             Feb 12 19:47:18 ip-100-0-100-1 kernel: [2541168.023638] XFS (md0): I/O
>             Error Detected. Shutting down filesystem
> 
>             Is an IO error from MD.
> 
>             I would suggest trying to reproduce without MD in the picture first -
>             always best to try to reproduce with the simplest setup first and work
>             your way up the complexity ladder,
> 
>             Ric
> 
> 
> 
> 
>         _________________________________________________
>         xfs mailing list
>         xfs@xxxxxxxxxxx <mailto:xfs@xxxxxxxxxxx>
>         http://oss.sgi.com/mailman/__listinfo/xfs <http://oss.sgi.com/mailman/listinfo/xfs>
> 
> 
> 
> 
> 
> -- 
> Regards
> *Shrinath.M*
> 
> 
> 
> _______________________________________________
> xfs mailing list
> xfs@xxxxxxxxxxx
> http://oss.sgi.com/mailman/listinfo/xfs
> 

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs


[Index of Archives]     [Linux XFS Devel]     [Linux Filesystem Development]     [Filesystem Testing]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux