Re: Cephfs - MDS all up:standby, not becoming up:active

Eric Dold <dold.eric@xxxxxxxxx> · Fri, 17 Sep 2021 14:53:45 +0200

Hi,

I get the same after upgrading to 16.2.6. All mds daemons are standby.

After setting
ceph fs set cephfs max_mds 1
ceph fs set cephfs allow_standby_replay false
the mds still wants to be standby.

2021-09-17T14:40:59.371+0200 7f810a58f600  0 ceph version 16.2.6
(ee28fb57e47e9f88813e24bbf4c14496ca299d31) pacific (stable), process
ceph-mds, pid 7113
2021-09-17T14:40:59.371+0200 7f810a58f600  1 main not setting numa affinity
2021-09-17T14:40:59.371+0200 7f810a58f600  0 pidfile_write: ignore empty
--pid-file
2021-09-17T14:40:59.375+0200 7f8105cf1700  1 mds.ceph3 Updating MDS map to
version 226251 from mon.0
2021-09-17T14:41:00.455+0200 7f8105cf1700  1 mds.ceph3 Updating MDS map to
version 226252 from mon.0
2021-09-17T14:41:00.455+0200 7f8105cf1700  1 mds.ceph3 Monitors have
assigned me to become a standby.

setting add_incompat 1 does also not work:
# ceph fs compat cephfs add_incompat 1
Error EINVAL: adding a feature requires a feature string

Any ideas?

On Fri, Sep 17, 2021 at 2:19 PM Joshua West <josh@xxxxxxx> wrote:

> Thanks Patrick,
>
> Similar to Robert, when trying that, I simply receive "Error EINVAL:
> adding a feature requires a feature string" 10x times.
>
> I attempted to downgrade, but wasn't able to successfully get my mons
> to come back up, as they had quincy specific "mon data structure
> changes" or something like that.
> So, I've settled into "17.0.0-6762-g0ff2e281889" on my cluster.
>
> cephfs is still down all this time later. (Good thing this is a
> learning cluster not in production, haha)
>
> I began to feel more and more that the issue was related to a damaged
> cephfs, from a recent set of server malfunctions on a single node
> causing mayhem on the cluster.
> (I went away for a bit, came back and one node had been killing itself
> every hour for 2 weeks, as it went on strike from the heat in the
> garage where it was living.)
>
> Recently went through the cephfs disaster recovery steps per the docs,
> with breaks per the docs to check if things were working in between
> some steps:
> cephfs-journal-tool --rank=cephfs:0 journal inspect
> cephfs-journal-tool --rank=cephfs:0 event recover_dentries summary
> cephfs-journal-tool --rank=cephfs:0 journal reset
> ceph fs reset cephfs --yes-i-really-mean-it
> #Check if working
> cephfs-table-tool all reset session
> cephfs-table-tool all reset snap
> cephfs-table-tool all reset inode
> #Check if working
> cephfs-data-scan init
>
> for ID in `seq 511`; do cephfs-data-scan scan_extents --worker_n $ID
> --worker_m 512 cephfs_data & done
> for ID in `seq 511`; do cephfs-data-scan scan_inodes --worker_n $ID
> --worker_m 512 cephfs_data & done
> (If anyone here can update the docs, cephfs-data-scan scan_extents,
> and scan_inodes, could use a for loop with many workers as I had to
> abandon running with 4 workers per the docs after over a week, but
> running 512 finished in a day)
>
> cephfs-data-scan scan_links
> cephfs-data-scan cleanup cephfs_data
>
> But mds still fail to come up, though the error has changed.
>
> ceph fs set cephfs max_mds 1
> ceph fs set cephfs allow_standby_replay false
>
> systemctl start ceph-mds@rog
> SEE ATTACHED LOGS
>
>
>
>
> Any guidance that can be offered would be greatly appreciated, as I've
> been without my cephfs data for almost 3 months now.
>
> Joshua
>
> On Fri, Sep 17, 2021 at 3:53 AM Robert Sander
> <r.sander@xxxxxxxxxxxxxxxxxxx> wrote:
> >
> > Hi,
> >
> > I had to run
> >
> > ceph fs set cephfs max_mds 1
> > ceph fs set cephfs allow_standby_replay false
> >
> > and stop all MDS and NFS containers and start one after the other again
> > to clear this issue.
> >
> > Regards
> > --
> > Robert Sander
> > Heinlein Consulting GmbH
> > Schwedter Str. 8/9b, 10119 Berlin
> >
> > https://www.heinlein-support.de
> >
> > Tel: 030 / 405051-43
> > Fax: 030 / 405051-19
> >
> > Amtsgericht Berlin-Charlottenburg - HRB 220009 B
> > Geschäftsführer: Peer Heinlein - Sitz: Berlin
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx