damaged cephfs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi there,
we reconfigured our ceph cluster yesterday to remove the cluster
network and things didn't quite go to plan. I am trying to figure out
what went wrong and also what to do next.

We are running nautilus 14.2.10 on Scientific Linux 7.8.

So, we are using a mixture of RBDs and cephfs. For the transition we
switched off all machines that are using the RBDs and switched off the
cephfs using
ceph fs set one down true
Once no more MDS were running we reconfigured ceph to remove the
cluster network and set various flags

ceph osd set noout
ceph osd set nodown
ceph osd set pause
ceph osd set nobackfill
ceph osd set norebalance
ceph osd set norecover

We then restarted the OSDs one host at a time. During this process ceph
was mostly happy, except for two PGs. After all OSDs had been restarted
we switched off the cluster network switches to make sure it was
totally gone. ceph was still happy. The PG error also disappeared. We
then unset all those errors and re-enabled cephfs.

We then switched on the servers using the RBDs with no issues. So far
so good.

We then started using the cephfs (we keep VM images on the cephfs). The
MDS were showing an error. I restarted the MDS but they didn't come
back. We then followed the instructions here:
https://docs.ceph.com/docs/nautilus/cephfs/disaster-recovery-experts/#disaster-recovery-experts
up to truncating the journal. The MDS started again. However, as soon
as we started writing the cephfs the MDS crashed. A scrub of the cephfs
revealed backtrace damage.

We have now followed the remaining steps of the disaster recovery
procedure and are waiting for the cephfs-data-scan scan_extents to
complete.

It would be really helpful if you could give an indication of how long
this process will take (we have ~40TB in our cephfs) and how many
workers to use.

The other missing bit of documentation is the cephfs scrubbing. Is that
something we should run routinely?

Regards
magnus
The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux