Re: damaged cephfs

Patrick Donnelly <pdonnell@xxxxxxxxxx> · Fri, 4 Sep 2020 10:25:56 -0700

Hello Magnus,

On Thu, Sep 3, 2020 at 11:55 PM Magnus HAGDORN <Magnus.Hagdorn@xxxxxxxx> wrote:
>
> Hi there,
> we reconfigured our ceph cluster yesterday to remove the cluster
> network and things didn't quite go to plan. I am trying to figure out
> what went wrong and also what to do next.
>
> We are running nautilus 14.2.10 on Scientific Linux 7.8.
>
> So, we are using a mixture of RBDs and cephfs. For the transition we
> switched off all machines that are using the RBDs and switched off the
> cephfs using
> ceph fs set one down true
> Once no more MDS were running we reconfigured ceph to remove the
> cluster network and set various flags
>
> ceph osd set noout
> ceph osd set nodown
> ceph osd set pause
> ceph osd set nobackfill
> ceph osd set norebalance
> ceph osd set norecover
>
> We then restarted the OSDs one host at a time. During this process ceph
> was mostly happy, except for two PGs. After all OSDs had been restarted
> we switched off the cluster network switches to make sure it was
> totally gone. ceph was still happy. The PG error also disappeared. We
> then unset all those errors and re-enabled cephfs.
>
> We then switched on the servers using the RBDs with no issues. So far
> so good.
>
> We then started using the cephfs (we keep VM images on the cephfs). The
> MDS were showing an error. I restarted the MDS but they didn't come
> back.We then followed the instructions here:
> https://docs.ceph.com/docs/nautilus/cephfs/disaster-recovery-experts/#disaster-recovery-experts
> up to truncating the journal. The MDS started again. However, as soon
> as we started writing the cephfs the MDS crashed. A scrub of the cephfs
> revealed backtrace damage.

I'm confused why you started the disaster recovery procedure when the
procedure you follow should result in no damage to the PGs (and
subsequently CephFS). It'd be helpful to know what this original error
was.

Backtrace damage is usually resolved with a scrub.

> We have now followed the remaining steps of the disaster recovery
> procedure and are waiting for the cephfs-data-scan scan_extents to
> complete.
>
> It would be really helpful if you could give an indication of how long
> this process will take (we have ~40TB in our cephfs) and how many
> workers to use.

I don't have any recent data on how long it could take but you might
try using at least 8 workers.

> The other missing bit of documentation is the cephfs scrubbing. Is that
> something we should run routinely?

CephFS scrubbing is usually done when something goes wrong or backing
metadata needs updated for some reason as part of an upgrade (e.g.
Mimic and snapshot formats). It's not considered necessary to do it on
a routine basis. RADOS PG scrubbing is sufficient for ensuring that
the backing data is routinely checked for correctness/redundancy.

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Principal Software Engineer
Red Hat Sunnyvale, CA
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx