Re: Files with non-ASCII names inaccessible after xfs_repair

Dave Chinner <david@xxxxxxxxxxxxx> · Mon, 13 Jan 2014 14:19:47 +1100

On Sun, Jan 12, 2014 at 06:36:03PM -0800, Zachary Kotlarek wrote:
> 
> On Jan 12, 2014, at 5:50 PM, Dave Chinner <david@xxxxxxxxxxxxx>
> wrote:
> 
> >> Attempts to access the now-busted files/directories with
> >> accents in their paths result in a kernel log like: Jan 11
> >> 02:05:39 vera XFS (dm-31): I/O error occurred: meta-data dev
> >> dm-31 block 0x3c8ff73e0       ("xfs_trans_read_buf") error 11
> >> buf count 4096
> > 
> > error 11 = EAGAIN/EWOULDBLOCK
> > 
> > That tends to imply that there's some interesting error
> > occurring in the layers below XFS here.
> 
> 
> The error you note only started showing up *after* the xfs_repair,
> and only when attempting to access the non-ascii file paths. It

Sure, but it's an error coming from the block layer, not from
decoding the contents of the block at the directory layer. IOWs,
xfs-repair wrote new contents to those blocks, and now the kernel
cannot read them from disk.

> doesn’t take the filesystem offline; other than those
> particular paths being inaccessible the filesystem seems to be
> working correctly (though I’ve suspended user writes until
> this is worked out).

Right - it's a read error, not a write error, so it simply returns
the error to the reader. However, if that read error occurs in the
context of a transaction that has already made modifications (e.g.
adding a new file to the directory) it will result in a shutdown of
the filesystem.

> The affected paths are all around the disk,
> all contain non-ascii characters in final portion of the path
> name, and do not affect other paths in the same directory.
> 
> I can find a newer kernel to boot off and see how it behaves if
> you think it would make any difference, but I’m pretty sure
> xfs_repair re-wrote the affected directory entries and broke them
> as opposed to some sort of block-layer corruption being
> responsible for breaking only these files.

Try using xfs_db to read and parse the blocks that the fielsystem is
choking on. If it can't read them from xfs_db, then there's
something gone wrong below XFS. If you can read them, use xfs_db to
parse the block as a directory block and see what the raw directory
entries are the block contains....

e.g.
# xfs_db <dev>
xfs_db> convert daddr 0x3c8ff73e0 fsb
<fsbno>
xfs_db> fsb <fsbno>
xfs_db> p
<hex output>
xfs_db> type dir2
xfs_db> p
<decoded output>

The decoded output should look something like:

xfs_db> p
dhdr.hdr.magic = 0x58444433
dhdr.hdr.crc = 0xc77dda8e (correct)
dhdr.hdr.bno = 1308
dhdr.hdr.lsn = 0xe60000aa35
dhdr.hdr.uuid = d2d0bec5-c8b8-420e-8a34-d981be7eece6
dhdr.hdr.owner = 32
dhdr.bestfree[0].offset = 0xa0
dhdr.bestfree[0].length = 0x10
dhdr.bestfree[1].offset = 0x4f8
dhdr.bestfree[1].length = 0x10
dhdr.bestfree[2].offset = 0x850
dhdr.bestfree[2].length = 0x10
du[0].inumber = 32
du[0].namelen = 1
du[0].name = "."
du[0].filetype = 2
du[0].tag = 0x40
du[1].inumber = 32
du[1].namelen = 2
du[1].name = ".."
du[1].filetype = 2
du[1].tag = 0x50
du[2].inumber = 393552
du[2].namelen = 3
du[2].name = "tmp"
du[2].filetype = 2
du[2].tag = 0x60
du[3].inumber = 36
du[3].namelen = 11
du[3].name = "syscalltest"
du[3].filetype = 1
du[3].tag = 0x70
du[4].inumber = 37
du[4].namelen = 8
du[4].name = "fsstress"
du[4].filetype = 2
du[4].tag = 0x88

if you do get the output like this, can you post both the hexdump
and the decoded output from xfs_db?

Hmmmm - you're not using LVM snapshots or anything like that are
you?

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs