Re: XFS filesystem corruption

Ric Wheeler <rwheeler@xxxxxxxxxx> · Wed, 06 Mar 2013 11:47:39 -0500

On 03/06/2013 11:16 AM, Julien FERRERO wrote:
Hi Emmanuel

2013/3/6 Emmanuel Florac <eflorac@xxxxxxxxxxxxxx>:
Le Wed, 6 Mar 2013 16:08:59 +0100 vous écriviez:

I am totally stuck and I really don't know how to duplicate the
corruption. I only know that units are used to be power cycle by
operator while the fs is still mounted (no proper shutdown / reboot).
My guess is the fs journal shall handle this case and avoid such
corruption.
Wrong guess. It may work or not, depending upon a long list of
parameters, but basically not turning it off properly is asking for
problems and corruptions. The problem will be tragically aggravated if
your hardware RAID doesn't have a battery backed-up cache.

OK but our server is 95% of the time reading data and 5% of the time
writing data. We have a case of a server that did not write anything
at the time of failure (and during all the uptime session). Moreover,
failure occurs to files that were opened in read-only or weren't
accessed at all at the time of failure. I don't think the H/W RAID is
the issue since we have the same corruption with other setup without
H/W RAID.

Does the "ls" output with "???" looks like a fs corruption ?

Caching can hold dirty data in volatile cache for a very long time. Even if you 
open a file in "read-only" mode, you still do a fair amount of writes to 
storage. You can use blktrace or similar tool to see just how much data is written.

As mentioned earlier, you always must unmount cleanly as a best practice. An 
operator that powers off with mounted file systems need educated or let go :)

Ric

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs