Re: damaged cephfs

Magnus HAGDORN <Magnus.Hagdorn@xxxxxxxx> · Sat, 5 Sep 2020 08:10:44 +0000

Hi Patrick,
thanks for the reply

On Fri, 2020-09-04 at 10:25 -0700, Patrick Donnelly wrote:
> > We then started using the cephfs (we keep VM images on the cephfs).
> > The
> > MDS were showing an error. I restarted the MDS but they didn't come
> > back.We then followed the instructions here:
> > https://docs.ceph.com/docs/nautilus/cephfs/disaster-recovery-experts/#disaster-recovery-experts
> > up to truncating the journal. The MDS started again. However, as
> > soon
> > as we started writing the cephfs the MDS crashed. A scrub of the
> > cephfs
> > revealed backtrace damage.
>
>
> I'm confused why you started the disaster recovery procedure when the
>
> procedure you follow should result in no damage to the PGs (and
>
> subsequently CephFS). It'd be helpful to know what this original
> error
>
> was.
>
>
so, when we re-enabled the cephfs I was monitoring the cluster using
ceph -w and I noticed lots of errors going past, something like

2020-09-03 09:30:24.711 7fd1d2932700 -1 log_channel(cluster) log [ERR]
:  replayed ESubtreeMap at 8537805160800 subtree root 0x1 not in cache
2020-09-03 09:30:24.712 7fd1d2932700  0 mds.0.journal journal subtrees:
{0x1=[],0x100=[]}
2020-09-03 09:30:24.712 7fd1d2932700  0 mds.0.journal journal
ambig_subtrees:
2020-09-03 09:30:24.712 7fd1d2932700 -1 log_channel(cluster) log [ERR]
:  replayed ESubtreeMap at 8537805208638 subtree root 0x1 not in cache
2020-09-03 09:30:24.712 7fd1d2932700  0 mds.0.journal journal subtrees:
{0x1=[],0x100=[]}
2020-09-03 09:30:24.712 7fd1d2932700  0 mds.0.journal journal
ambig_subtrees:
2020-09-03 09:30:24.714 7fd1d2932700  0 mds.0.journal EMetaBlob.replay
missing dir ino  0x1000003857d
2020-09-03 09:30:24.714 7fd1d2932700 -1 log_channel(cluster) log [ERR]
: failure replaying journal (EMetaBlob)
2020-09-03 09:30:24.714 7fd1d2932700  1 mds.store07 respawn!

I, perhaps foolishly, restarted mds daemons. Eventually the last one
didn't come back and the cephfs was in error.

I am not quite sure what we tried at this stage. I think we started the
cephfs scrub which found some backtrace errors. However, again perhaps
foolishly, we started using cephfs during the scrub process and MDS
crashed when the clients started writing to the cephfs. At this stage
should we have waited for the scrub to complete before allowing the
clients to write to the filesystem?

At that stage we started the recovery procedure.

>
> Backtrace damage is usually resolved with a scrub.
>
>
this is not clear from the documentation.

>
> > We have now followed the remaining steps of the disaster recovery
> > procedure and are waiting for the cephfs-data-scan scan_extents to
> > complete.
> > It would be really helpful if you could give an indication of how
> > long
> > this process will take (we have ~40TB in our cephfs) and how many
> > workers to use.
>
>
> I don't have any recent data on how long it could take but you might
>
> try using at least 8 workers.
>
>

We are using 4 workers and the first stage hasn't completed yet. Is it
safe to interrupt and restart the procedure with more workers? Can the
workers be run on different machines?

>
> > The other missing bit of documentation is the cephfs scrubbing. Is
> > that
> > something we should run routinely?
>
>
> CephFS scrubbing is usually done when something goes wrong or backing
>
> metadata needs updated for some reason as part of an upgrade (e.g.
>
> Mimic and snapshot formats). It's not considered necessary to do it
> on
>
> a routine basis. RADOS PG scrubbing is sufficient for ensuring that
>
> the backing data is routinely checked for correctness/redundancy.

ok, that's very helpful information. Does the cephfs need to be in a
particular state for the scrub to be run?

Perhaps us restarting the cephfs uncovered an earlier error:
2020-08-31 12:54:45.976 7f10fe790700  0 mds.2.journal EMetaBlob.replay
missing dir ino  0x10002024c23
2020-08-31 12:54:45.979 7f10fe790700 -1 log_channel(cluster) log [ERR]
: failure replaying journal (EMetaBlob)
2020-08-31 12:54:45.979 7f10fe790700  1 mds.store06 respawn!

which we hadn't appreciated. would a scrub have resolved that?

Thanks a lot for your replies.

Regards
magnus

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx