Re: How to repair MDS damage?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Feb 14, 2017 at 9:33 AM, Oliver Schulz <oschulz@xxxxxxxxxx> wrote:
> Dear Ceph Experts,
>
> after upgrading our Ceph cluster from Hammer to Jewel,
> the MDS (after a few days) found some metadata damage:
>
>    # ceph status
>    [...]
>    health HEALTH_ERR
>          mds0: Metadata damage detected
>    [...]
>
> The output of
>
>    # ceph tell mds.0 damage ls
>
> is:
>
>    [
>       {
>          "ino" : [...],
>          "id" : [...],
>          "damage_type" : "backtrace"
>       },
>       [...]
>    ]
>
> There are 5 such "damage_type" : "backtrace" entries in total.
>
> I'm not really surprised, there were a very few instances in
> the past where one or two (mostly empty directories) and
> symlinks acted strangely, and couldn't be deleted
> ("rm results in "Invalid argument"). Back then, I moved them
> all in a "quarantine" directory, but wasn't able to do anything
> about it.
>
> Now that CephFS does more rigorous checks and has spotted
> the trouble - how do I go about repairing this?

>From Kraken onwards, backtraces can be repaired using "ceph daemon
mds.<id> <path> recursive repair" on the path containing the primary
dentry for the file (i.e. for hardlinks, the place the file was
originally created).

To identify the path that contains the inode, you can either do an
exhausting search using `find` (there is an argument that lets you
search by inode number), or try searching your mds logs to see the
point where it found the damage, where it may have printed the path.
However, the path in use when it detected the damage may have been a
remote link (i.e. a hardlink), which wouldn't be the path you want.

You can work around this for pre-Kraken MDS versions by changing the
file's name or immediate parentage (i.e. rename or move the file),
then use a "ceph daemon mds.<id> flush journal" to force it to flush
out the new backtrace immediately.

Once you believe a new backtrace is written, use the "damage rm"
command to clear the damage entry, and try accessing the file via a
hardlink again to see if it's working now.

John

>
>
> Cheers and thanks for any help,
>
> Oliver
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux