Re: Files with non-ASCII names inaccessible after xfs_repair

Dave Chinner <david@xxxxxxxxxxxxx> · Mon, 13 Jan 2014 12:50:07 +1100

On Sun, Jan 12, 2014 at 11:53:59AM -0800, Zachary Kotlarek wrote:
> 
> On Jan 12, 2014, at 10:47 AM, Stan Hoeppner <stan@xxxxxxxxxxxxxxxxx> wrote:
> 
> > http://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F
> > 
> > If this is due to a bug it may have already been fixed.  Note the first
> > two things asked for.
> 
> 
> Thanks for the pointer.
> 
> My kernels a bit old, but xfsprogs is shiny and new:
> Linux vera 2.6.39.2 #1 SMP Fri Sep 30 23:55:41 PDT 2011 x86_64 x86_64 x86_64 GNU/Linux
> xfs_repair version 3.1.11
> 
> 2x4 core CPUs
> 8 GB RAM, mostly free (more than 6 GB cached)
> 
> Related mount:
> /dev/lvmsas/tv /mnt/media/TV xfs rw,nosuid,nodev,noexec,relatime,attr2,delaylog,inode64,sunit=1024,swidth=4096,noquota 0 0
> 
> Underlying partition:
>  254       31 16252928000 dm-31
> 
> Which is a no-frills LVM2 volume allocation over mdadm raid-6.
> 
> meta-data=/dev/lvmsas/tv         isize=256    agcount=33, agsize=126975872 blks
>          =                       sectsz=512   attr=2
> data     =                       bsize=4096   blocks=4063232000, imaxpct=5
>          =                       sunit=128    swidth=512 blks
> naming   =version 2              bsize=4096   ascii-ci=1
> log      =internal               bsize=4096   blocks=521728, version=2
>          =                       sectsz=512   sunit=8 blks, lazy-count=1
> realtime =none                   extsz=4096   blocks=0, rtextents=0
> 
> Attempts to access the now-busted files/directories with accents in their paths result in a kernel log like:
> Jan 11 02:05:39 vera XFS (dm-31): I/O error occurred: meta-data dev dm-31 block 0x3c8ff73e0       ("xfs_trans_read_buf") error 11 buf count 4096

error 11 = EAGAIN/EWOULDBLOCK

That tends to imply that there's some interesting error occurring in
the layers below XFS here. XFS on a kernel that old is not expecting
an EAGAIN error from storage, so it is likely not being captured
properly. There have been bugs in the raid/dm code in the past that
would cause issues like this, and bugs in the XFS error handling
that allowed them to slip throw and shut down the filesystem.

For example, this fix made in March 2013:

$ gl -n1 -p c163f9a
commit c163f9a1760229a95d04e37b332de7d5c1c225cd
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Tue Mar 12 23:30:34 2013 +1100

    xfs: ensure we capture IO errors correctly

    Failed buffer readahead can leave the buffer in the cache marked
    with an error. Most callers that then issue a subsequent read on the
    buffer do not zero the b_error field out, and so we may incorectly
    detect an error during IO completion due to the stale error value
    left on the buffer.

    Avoid this problem by zeroing the error before IO submission. This
    ensures that the only IO errors that are detected those captured
    from are those captured from bio submission or completion.

    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Mark Tinguely <tinguely@xxxxxxx>
    Signed-off-by: Ben Myers <bpm@xxxxxxx>

Is probably relevant, but there are many more changes up and down
the stack that may be the cause of your problem. Indeed, the above
fix may simply turn EAGAIN into EIO because there really is
something wrong with that block on disk....

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs