Re: Possible ext4 corruption - ACL related?

Kevin Shanahan <kmshanah@xxxxxxxxxxx> · Thu, 12 Mar 2009 04:37:10 +1030

On Wed, 2009-03-11 at 09:25 -0400, Theodore Tso wrote: 
> On Wed, Mar 11, 2009 at 12:18:39AM -0600, Andreas Dilger wrote:
> > On Mar 11, 2009  12:18 +1030, Kevin Shanahan wrote:
> > > On Wed, 2009-03-11 at 12:13 +1030, Kevin Shanahan wrote:
> > > > 
> > > >   getfattr: apps/Gestalt.Net/SetupCD/program\040files/Business\040Objects/Common/3.5/bin/RptControllers.dll: Input/output error
> > > > 
> > > > And syslog shows:
> > > >   Mar 11 00:06:24 hermes kernel: attempt to access beyond end of device
> > > >   Mar 11 00:06:24 hermes kernel: dm-0: rw=0, want=946232834916360, limit=2147483648
> > > > 
> > > > hermes:~# debugfs /dev/dm-0
> > > > debugfs 1.41.3 (12-Oct-2008)
> > > > debugfs:  stat "local/apps/Gestalt.Net/SetupCD/program files/Business Objects/Common/3.5/bin/RptControllers.dll"
> > > > 
> > > > Inode: 875   Type: FIFO    Mode:  0611   Flags: 0xb3b9c185
> > > > Generation: 3690868    Version: 0x9d36b10d
> > > > User: 868313917   Group: -1340283792   Size: 0
> > > > File ACL: 0    Directory ACL: 0
> > > > Links: 1   Blockcount: 0
> > > > Fragment:  Address: 0    Number: 0    Size: 0
> > > > ctime: 0x0742afc4 -- Sun Nov 11 06:51:24 1973
> > > > atime: 0x472a2311 -- Fri Nov  2 05:33:45 2007
> > > > mtime: 0x80c59881 -- Fri Jun 18 09:51:21 2038
> > > > Size of extra inode fields: 4
> > > > BLOCKS:
> > 
> > There isn't anything obvious here that would imply reading a wacky block
> > beyond the end of the filesystem.  I even checked if e.g. you had quotas
> > enabled and the bogus UID/GID would result in the quota file becoming
> > astronomically large or something, but the numbers don't seem to match.
> 
> More to the point, given that mode bits of the file detected the file
> as a named pipe ("Type: FIFO"), it wouldn't have tried to access the
> the disk.  Trying to read from a named pipe would have resulted in a
> hang (assuming no data in the named pipe); writing to named pipe would
> have succeeded (and queued the data until another program tried
> reading from the named pipe).  So getting an I/O error from that file
> doesn't make any sense.

But getfattr isn't going to cause a read from the pipe is it? I would
expect that to cause a read from the disk. 

> > Yes, you should just delete the inodes reported corrupted in your
> > earlier postings in the 87x range - they contain nothing of value
> > anymore, and I suspect your troubles would be gone.  At least we
> > wouldn't be left wondering if you are seeing new corruption in
> > the same range of blocks, or just leftover badness.
> 
> The inodes in question that are on that block would be inode numbers
> 864 to 879, inclusive.  You can get the names of the files in question
> using the ncheck command:
> 
> debugfs: ncheck 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879

debugfs:  ncheck 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879
Inode	Pathname
864	/local/apps/Gestalt.Net/SetupCD/program files/Business Objects/Common/3.5/bin/Cdo32pl.dll
875	/local/apps/Gestalt.Net/SetupCD/program files/Business Objects/Common/3.5/bin/RptControllers.dll

I haven't rm'd the files just yet, but good to know which ones they are.

> ... but at this point, I'm beginning to wonder if what is going on is
> something in the I/O stack is occasionally returning random garbage
> when you read from the particular block in question.  The contents
> reported for debugfs for block 875 should not have caused an I/O error
> when you tried reading from the file.  You can create your own named
> pipe by using the command "mknod /tmp/test-fifo p", and playing with
> it.

Okay, a quick check shows that getfattr on the test-fifo file doesn't
cause an I/O error and it doesn't block either. A straight read from the
pipe does block of course.

> So I'm wondering if when the kernel read block 875, it got one
> version of garbage, and then when debugfs read block 875 later, it got
> another version of garbage.
> 
> One of the original inodes involved was 867, right?  You might want to
> try using the "stat <867>" command and seeing if it still contains
> garbage or not.  Since that was e2fsck should have deleted for you (or
> did you delete it manually yourself?), it should either be all zero's,
> or it should contain the same inode garbage you had sent to the list,
> but with an i_links_count of zero if you deleting the file via the
> "rm" command.  If it contains a different version of garbage, then
> something is corrupting that block, possibly on the read path or the
> write path.

debugfs:  stat <867>

Inode: 867   Type: bad type    Mode:  0404   Flags: 0x0
Generation: 2483046020    Version: 0x17a7fdfd
User: 1455931783   Group: -798021131   Size: 0
File ACL: 0    Directory ACL: 0
Links: 0   Blockcount: 0
Fragment:  Address: 956780679    Number: 0    Size: 0
ctime: 0xdca60244 -- Wed Apr 23 01:54:36 2087
atime: 0x5c9e956c -- Sat Mar 30 08:30:12 2019
mtime: 0x2ce44e11 -- Sat Nov 13 13:31:37 1993
dtime: 0x49b6564f -- Tue Mar 10 22:30:15 2009
Size of extra inode fields: 4
BLOCKS:
(0):1487030929, (1):3739364871, (2):16299385, (3):2955804704,
(4):3028301176, (5):3255403360, (6):4066441585, (7):643698920,
(8):377498450, (9):297332775, (10):2206476866, (11):169813600,
(IND):2885921245, (DIND):1077961371, (TIND):3308808842
TOTAL: 15

Looks like fsck cleaned up a number of the fields, but not all zeroed.
It seems to have gained some blocks too, but I guess that is meaningless
for an unlinked inode?

> We're now at the stage where I have to start asking questions about
> the storage stack --- i.e. have you used this with this exact
> hardware/configuration with ext3, and was it stable there,, have you
> made any recent changes to the hardware/configuration, etc., since
> this is starting to smell like a potential storage stack problem.

The same hardware has been stable with an ext3 filesystem for about 9
months before this. The only other change that I made at the same time
as moving to ext4 was reconfiguring the RAID from what was originally a
10-disk RAID6 array to a 9-disk RAID6 array with room for one hot spare
(though that spare is not installed currently).

In case it matters, the change to ext4 was done by creating a new
filesystem and rsync'ing the data across, rather than upgrading an
existing ext3 filesystem.

Cheers,
Kevin.

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html