You may want to configure your standby-mds's to be "standby-replay" so
the mds that's taking over from the failed one takes less time to take
over. To manage this you add to your ceph.conf something like this:
---snip---
[mds.server1]
mds_standby_replay = true
mds_standby_for_rank = 0
[mds.server2]
mds_standby_replay = true
mds_standby_for_rank = 0
[mds.server3]
mds_standby_replay = true
mds_standby_for_rank = 0
---snip---
For your setup this would mean you have one active mds, one as
standby-replay (that takes over immediately, depending on the load a
very short interruption could happen) and one as standby ("cold
standby" if you will). Currently both your standby mds servers are
"cold".
Zitat von DHilsbos@xxxxxxxxxxxxxx:
Eugen;
All services are running, yes, though they didn't all start when I
brought the host up (configured not to start because the last thing
I had done is physically relocate the entire cluster).
All services are running, and happy.
# ceph status
cluster:
id: 1a8a1693-fa54-4cb3-89d2-7951d4cee6a3
health: HEALTH_OK
services:
mon: 3 daemons, quorum S700028,S700029,S700030 (age 20h)
mgr: S700028(active, since 17h), standbys: S700029, S700030
mds: cifs:1 {0=S700029=up:active} 2 up:standby
osd: 6 osds: 6 up (since 21h), 6 in (since 21h)
data:
pools: 16 pools, 192 pgs
objects: 449 objects, 761 MiB
usage: 724 GiB used, 65 TiB / 66 TiB avail
pgs: 192 active+clean
# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 66.17697 root default
-5 22.05899 host S700029
2 hdd 11.02950 osd.2 up 1.00000 1.00000
3 hdd 11.02950 osd.3 up 1.00000 1.00000
-7 22.05899 host S700030
4 hdd 11.02950 osd.4 up 1.00000 1.00000
5 hdd 11.02950 osd.5 up 1.00000 1.00000
-3 22.05899 host s700028
0 hdd 11.02950 osd.0 up 1.00000 1.00000
1 hdd 11.02950 osd.1 up 1.00000 1.00000
The question about configuring the MDS as failover struck me as a
potential, since I don't remember doing that, however it look like
S700029 (10.0.200.111) took over from S700028 (10.0.200.110) as the
active MDS.
Thank you,
Dominic L. Hilsbos, MBA
Director - Information Technology
Perform Air International Inc.
DHilsbos@xxxxxxxxxxxxxx
www.PerformAir.com
-----Original Message-----
From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On
Behalf Of Eugen Block
Sent: Thursday, June 27, 2019 8:23 AM
To: ceph-users@xxxxxxxxxxxxxx
Subject: Re: MGR Logs after Failure Testing
Hi,
some more information about the cluster status would be helpful, such as
ceph -s
ceph osd tree
service status of all MONs, MDSs, MGRs.
Are all services up? Did you configure the spare MDS as standby for
rank 0 so that a failover can happen?
Regards,
Eugen
Zitat von DHilsbos@xxxxxxxxxxxxxx:
All;
I built a demonstration and testing cluster, just 3 hosts
(10.0.200.110, 111, 112). Each host runs mon, mgr, osd, mds.
During the demonstration yesterday, I pulled the power on one of the hosts.
After bringing the host back up, I'm getting several error messages
every second or so:
2019-06-26 16:01:56.424 7fcbe0af9700 0 ms_deliver_dispatch:
unhandled message 0x55e80a728f00 mgrreport(mds.S700030 +0-0 packed
6) v7 from mds.? v2:10.0.200.112:6808/980053124
2019-06-26 16:01:56.425 7fcbf4cd1700 1 mgr finish mon failed to
return metadata for mds.S700030: (2) No such file or directory
2019-06-26 16:01:56.429 7fcbe0af9700 0 ms_deliver_dispatch:
unhandled message 0x55e809f8e600 mgrreport(mds.S700029 +110-0 packed
1366) v7 from mds.0 v2:10.0.200.111:6808/2726495738
2019-06-26 16:01:56.430 7fcbf4cd1700 1 mgr finish mon failed to
return metadata for mds.S700029: (2) No such file or directory
Thoughts?
Thank you,
Dominic L. Hilsbos, MBA
Director - Information Technology
Perform Air International Inc.
DHilsbos@xxxxxxxxxxxxxx
www.PerformAir.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com