Re: MGR Logs after Failure Testing

<DHilsbos@xxxxxxxxxxxxxx> · Thu, 27 Jun 2019 16:25:01 +0000

Eugen;

All services are running, yes, though they didn't all start when I brought the host up (configured not to start because the last thing I had done is physically relocate the entire cluster).

All services are running, and happy.

# ceph status
  cluster:
    id:     1a8a1693-fa54-4cb3-89d2-7951d4cee6a3
    health: HEALTH_OK

  services:
    mon: 3 daemons, quorum S700028,S700029,S700030 (age 20h)
    mgr: S700028(active, since 17h), standbys: S700029, S700030
    mds: cifs:1 {0=S700029=up:active} 2 up:standby
    osd: 6 osds: 6 up (since 21h), 6 in (since 21h)

  data:
    pools:   16 pools, 192 pgs
    objects: 449 objects, 761 MiB
    usage:   724 GiB used, 65 TiB / 66 TiB avail
    pgs:     192 active+clean

# ceph osd tree
ID CLASS WEIGHT   TYPE NAME        STATUS REWEIGHT PRI-AFF
-1       66.17697 root default
-5       22.05899     host S700029
 2   hdd 11.02950         osd.2        up  1.00000 1.00000
 3   hdd 11.02950         osd.3        up  1.00000 1.00000
-7       22.05899     host S700030
 4   hdd 11.02950         osd.4        up  1.00000 1.00000
 5   hdd 11.02950         osd.5        up  1.00000 1.00000
-3       22.05899     host s700028
 0   hdd 11.02950         osd.0        up  1.00000 1.00000
 1   hdd 11.02950         osd.1        up  1.00000 1.00000

The question about configuring the MDS as failover struck me as a potential, since I don't remember doing that, however it look like S700029 (10.0.200.111) took over from S700028 (10.0.200.110) as the active MDS.

Thank you,

Dominic L. Hilsbos, MBA 
Director - Information Technology 
Perform Air International Inc.
DHilsbos@xxxxxxxxxxxxxx 
www.PerformAir.com

-----Original Message-----
From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of Eugen Block
Sent: Thursday, June 27, 2019 8:23 AM
To: ceph-users@xxxxxxxxxxxxxx
Subject: Re:  MGR Logs after Failure Testing

Hi,

some more information about the cluster status would be helpful, such as

ceph -s
ceph osd tree

service status of all MONs, MDSs, MGRs.
Are all services up? Did you configure the spare MDS as standby for  
rank 0 so that a failover can happen?

Regards,
Eugen

Zitat von DHilsbos@xxxxxxxxxxxxxx:

> All;
>
> I built a demonstration and testing cluster, just 3 hosts  
> (10.0.200.110, 111, 112).  Each host runs mon, mgr, osd, mds.
>
> During the demonstration yesterday, I pulled the power on one of the hosts.
>
> After bringing the host back up, I'm getting several error messages  
> every second or so:
> 2019-06-26 16:01:56.424 7fcbe0af9700  0 ms_deliver_dispatch:  
> unhandled message 0x55e80a728f00 mgrreport(mds.S700030 +0-0 packed  
> 6) v7 from mds.? v2:10.0.200.112:6808/980053124
> 2019-06-26 16:01:56.425 7fcbf4cd1700  1 mgr finish mon failed to  
> return metadata for mds.S700030: (2) No such file or directory
> 2019-06-26 16:01:56.429 7fcbe0af9700  0 ms_deliver_dispatch:  
> unhandled message 0x55e809f8e600 mgrreport(mds.S700029 +110-0 packed  
> 1366) v7 from mds.0 v2:10.0.200.111:6808/2726495738
> 2019-06-26 16:01:56.430 7fcbf4cd1700  1 mgr finish mon failed to  
> return metadata for mds.S700029: (2) No such file or directory
>
> Thoughts?
>
> Thank you,
>
> Dominic L. Hilsbos, MBA
> Director - Information Technology
> Perform Air International Inc.
> DHilsbos@xxxxxxxxxxxxxx
> www.PerformAir.com
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com