Re: MGR Logs after Failure Testing

Eugen Block <eblock@xxxxxx> · Fri, 28 Jun 2019 09:18:47 +0000

You may want to configure your standby-mds's to be "standby-replay" so  
the mds that's taking over from the failed one takes less time to take  
over. To manage this you add to your ceph.conf something like this:

---snip---
[mds.server1]
mds_standby_replay = true
mds_standby_for_rank = 0

[mds.server2]
mds_standby_replay = true
mds_standby_for_rank = 0

[mds.server3]
mds_standby_replay = true
mds_standby_for_rank = 0
---snip---

For your setup this would mean you have one active mds, one as  
standby-replay (that takes over immediately, depending on the load a  
very short interruption could happen) and one as standby ("cold  
standby" if you will). Currently both your standby mds servers are  
"cold".

Zitat von DHilsbos@xxxxxxxxxxxxxx:

Eugen;

All services are running, yes, though they didn't all start when I  
brought the host up (configured not to start because the last thing  
I had done is physically relocate the entire cluster).

All services are running, and happy.

# ceph status
  cluster:
    id:     1a8a1693-fa54-4cb3-89d2-7951d4cee6a3
    health: HEALTH_OK

  services:
    mon: 3 daemons, quorum S700028,S700029,S700030 (age 20h)
    mgr: S700028(active, since 17h), standbys: S700029, S700030
    mds: cifs:1 {0=S700029=up:active} 2 up:standby
    osd: 6 osds: 6 up (since 21h), 6 in (since 21h)

  data:
    pools:   16 pools, 192 pgs
    objects: 449 objects, 761 MiB
    usage:   724 GiB used, 65 TiB / 66 TiB avail
    pgs:     192 active+clean

# ceph osd tree
ID CLASS WEIGHT   TYPE NAME        STATUS REWEIGHT PRI-AFF
-1       66.17697 root default
-5       22.05899     host S700029
 2   hdd 11.02950         osd.2        up  1.00000 1.00000
 3   hdd 11.02950         osd.3        up  1.00000 1.00000
-7       22.05899     host S700030
 4   hdd 11.02950         osd.4        up  1.00000 1.00000
 5   hdd 11.02950         osd.5        up  1.00000 1.00000
-3       22.05899     host s700028
 0   hdd 11.02950         osd.0        up  1.00000 1.00000
 1   hdd 11.02950         osd.1        up  1.00000 1.00000

The question about configuring the MDS as failover struck me as a  
potential, since I don't remember doing that, however it look like  
S700029 (10.0.200.111) took over from S700028 (10.0.200.110) as the  
active MDS.

Thank you,

Dominic L. Hilsbos, MBA
Director - Information Technology
Perform Air International Inc.
DHilsbos@xxxxxxxxxxxxxx
www.PerformAir.com

-----Original Message-----
From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On  
Behalf Of Eugen Block
Sent: Thursday, June 27, 2019 8:23 AM
To: ceph-users@xxxxxxxxxxxxxx
Subject: Re:  MGR Logs after Failure Testing

Hi,

some more information about the cluster status would be helpful, such as

ceph -s
ceph osd tree

service status of all MONs, MDSs, MGRs.
Are all services up? Did you configure the spare MDS as standby for
rank 0 so that a failover can happen?

Regards,
Eugen

Zitat von DHilsbos@xxxxxxxxxxxxxx:

All;

I built a demonstration and testing cluster, just 3 hosts
(10.0.200.110, 111, 112).  Each host runs mon, mgr, osd, mds.

During the demonstration yesterday, I pulled the power on one of the hosts.

After bringing the host back up, I'm getting several error messages
every second or so:
2019-06-26 16:01:56.424 7fcbe0af9700  0 ms_deliver_dispatch:
unhandled message 0x55e80a728f00 mgrreport(mds.S700030 +0-0 packed
6) v7 from mds.? v2:10.0.200.112:6808/980053124
2019-06-26 16:01:56.425 7fcbf4cd1700  1 mgr finish mon failed to
return metadata for mds.S700030: (2) No such file or directory
2019-06-26 16:01:56.429 7fcbe0af9700  0 ms_deliver_dispatch:
unhandled message 0x55e809f8e600 mgrreport(mds.S700029 +110-0 packed
1366) v7 from mds.0 v2:10.0.200.111:6808/2726495738
2019-06-26 16:01:56.430 7fcbf4cd1700  1 mgr finish mon failed to
return metadata for mds.S700029: (2) No such file or directory

Thoughts?

Thank you,

Dominic L. Hilsbos, MBA
Director - Information Technology
Perform Air International Inc.
DHilsbos@xxxxxxxxxxxxxx
www.PerformAir.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com