Re: Cephfs - MDS all up:standby, not becoming up:active

Joshua West <josh@xxxxxxx> · Fri, 17 Sep 2021 06:18:50 -0600

Thanks Patrick,

Similar to Robert, when trying that, I simply receive "Error EINVAL:
adding a feature requires a feature string" 10x times.

I attempted to downgrade, but wasn't able to successfully get my mons
to come back up, as they had quincy specific "mon data structure
changes" or something like that.
So, I've settled into "17.0.0-6762-g0ff2e281889" on my cluster.

cephfs is still down all this time later. (Good thing this is a
learning cluster not in production, haha)

I began to feel more and more that the issue was related to a damaged
cephfs, from a recent set of server malfunctions on a single node
causing mayhem on the cluster.
(I went away for a bit, came back and one node had been killing itself
every hour for 2 weeks, as it went on strike from the heat in the
garage where it was living.)

Recently went through the cephfs disaster recovery steps per the docs,
with breaks per the docs to check if things were working in between
some steps:
cephfs-journal-tool --rank=cephfs:0 journal inspect
cephfs-journal-tool --rank=cephfs:0 event recover_dentries summary
cephfs-journal-tool --rank=cephfs:0 journal reset
ceph fs reset cephfs --yes-i-really-mean-it
#Check if working
cephfs-table-tool all reset session
cephfs-table-tool all reset snap
cephfs-table-tool all reset inode
#Check if working
cephfs-data-scan init

for ID in `seq 511`; do cephfs-data-scan scan_extents --worker_n $ID
--worker_m 512 cephfs_data & done
for ID in `seq 511`; do cephfs-data-scan scan_inodes --worker_n $ID
--worker_m 512 cephfs_data & done
(If anyone here can update the docs, cephfs-data-scan scan_extents,
and scan_inodes, could use a for loop with many workers as I had to
abandon running with 4 workers per the docs after over a week, but
running 512 finished in a day)

cephfs-data-scan scan_links
cephfs-data-scan cleanup cephfs_data

But mds still fail to come up, though the error has changed.

ceph fs set cephfs max_mds 1
ceph fs set cephfs allow_standby_replay false

systemctl start ceph-mds@rog
SEE ATTACHED LOGS

Any guidance that can be offered would be greatly appreciated, as I've
been without my cephfs data for almost 3 months now.

Joshua

On Fri, Sep 17, 2021 at 3:53 AM Robert Sander
<r.sander@xxxxxxxxxxxxxxxxxxx> wrote:
>
> Hi,
>
> I had to run
>
> ceph fs set cephfs max_mds 1
> ceph fs set cephfs allow_standby_replay false
>
> and stop all MDS and NFS containers and start one after the other again
> to clear this issue.
>
> Regards
> --
> Robert Sander
> Heinlein Consulting GmbH
> Schwedter Str. 8/9b, 10119 Berlin
>
> https://www.heinlein-support.de
>
> Tel: 030 / 405051-43
> Fax: 030 / 405051-19
>
> Amtsgericht Berlin-Charlottenburg - HRB 220009 B
> Geschäftsführer: Peer Heinlein - Sitz: Berlin
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx