Re: 6 hosts fail cephadm check (15.2.4)

Sebastian Wagner <swagner@xxxxxxxx> · Tue, 28 Jul 2020 10:34:26 +0200

Looks as if your cluster is still running 15.2.1.

Have a look at https://docs.ceph.com/docs/master/cephadm/upgrade/

Am 28.07.20 um 09:57 schrieb Ml Ml:
> Hello,
> 
> i get:
> 
> [WRN] CEPHADM_HOST_CHECK_FAILED: 6 hosts fail cephadm check
>     host ceph01 failed check: Failed to connect to ceph01 (ceph01).
> Check that the host is reachable and accepts connections using the
> cephadm SSH key
> you may want to run:
>> ssh -F =(ceph cephadm get-ssh-config) -i =(ceph config-key get mgr/cephadm/ssh_identity_key) root@ceph01
>     host ceph02 failed check: Failed to connect to ceph02 (10.10.1.2).
> Check that the host is reachable and accepts connections using the
> cephadm SSH key
> you may want to run:
>> ssh -F =(ceph cephadm get-ssh-config) -i =(ceph config-key get mgr/cephadm/ssh_identity_key) root@ceph02
>     host ceph03 failed check: Failed to connect to ceph03 (10.10.1.3).
> Check that the host is reachable and accepts connections using the
> cephadm SSH key
> you may want to run:
>> ssh -F =(ceph cephadm get-ssh-config) -i =(ceph config-key get mgr/cephadm/ssh_identity_key) root@ceph03
>     host ceph04 failed check: Failed to connect to ceph04 (10.10.1.4).
> Check that the host is reachable and accepts connections using the
> cephadm SSH key
> you may want to run:
>> ssh -F =(ceph cephadm get-ssh-config) -i =(ceph config-key get mgr/cephadm/ssh_identity_key) root@ceph04
>     host ceph05 failed check: Failed to connect to ceph05 (10.10.1.5).
> Check that the host is reachable and accepts connections using the
> cephadm SSH key
> you may want to run:
>> ssh -F =(ceph cephadm get-ssh-config) -i =(ceph config-key get mgr/cephadm/ssh_identity_key) root@ceph05
>     host ceph06 failed check: Failed to connect to ceph06 (10.10.1.6).
> Check that the host is reachable and accepts connections using the
> cephadm SSH key
> 
> 
> on ceph01 i run:
> ceph cephadm get-ssh-config > /tmp/ceph.conf
> ceph config-key get mgr/cephadm/ssh_identity_key > /tmp/ceph.key
> chmod 600 /tmp/ceph.key
> ssh -F /tmp/ceph.conf -i /tmp/ceph.key root@ceph01 (which works)
> 
> So i can not understand the errors above.
> 
> root@ceph01:~# ceph versions
> {
>     "mon": {
>         "ceph version 15.2.1
> (9fd2f65f91d9246fae2c841a6222d34d121680ee) octopus (stable)": 3
>     },
>     "mgr": {
>         "ceph version 15.2.1
> (9fd2f65f91d9246fae2c841a6222d34d121680ee) octopus (stable)": 3
>     },
>     "osd": {
>         "ceph version 15.2.1
> (9fd2f65f91d9246fae2c841a6222d34d121680ee) octopus (stable)": 56
>     },
>     "mds": {
>         "ceph version 15.2.1
> (9fd2f65f91d9246fae2c841a6222d34d121680ee) octopus (stable)": 1
>     },
>     "overall": {
>         "ceph version 15.2.1
> (9fd2f65f91d9246fae2c841a6222d34d121680ee) octopus (stable)": 63
>     }
> }
> 
> root@ceph01:~# dpkg -l |grep ceph
> ii  ceph-base                               15.2.4-1~bpo10+1
>    amd64        common ceph daemon libraries and management tools
> ii  ceph-common                             15.2.4-1~bpo10+1
>    amd64        common utilities to mount and interact with a ceph
> storage cluster
> ii  ceph-deploy                             2.0.1
>    all          Ceph-deploy is an easy to use configuration tool
> ii  ceph-fuse                               15.2.4-1~bpo10+1
>    amd64        FUSE-based client for the Ceph distributed file system
> ii  ceph-grafana-dashboards                 15.2.4-1~bpo10+1
>    all          grafana dashboards for the ceph dashboard
> ii  ceph-mds                                15.2.4-1~bpo10+1
>    amd64        metadata server for the ceph distributed file system
> ii  ceph-mgr                                15.2.4-1~bpo10+1
>    amd64        manager for the ceph distributed storage system
> ii  ceph-mgr-cephadm                        15.2.4-1~bpo10+1
>    all          cephadm orchestrator module for ceph-mgr
> ii  ceph-mgr-dashboard                      15.2.4-1~bpo10+1
>    all          dashboard module for ceph-mgr
> ii  ceph-mgr-diskprediction-cloud           15.2.4-1~bpo10+1
>    all          diskprediction-cloud module for ceph-mgr
> ii  ceph-mgr-diskprediction-local           15.2.4-1~bpo10+1
>    all          diskprediction-local module for ceph-mgr
> ii  ceph-mgr-k8sevents                      15.2.4-1~bpo10+1
>    all          kubernetes events module for ceph-mgr
> ii  ceph-mgr-modules-core                   15.2.4-1~bpo10+1
>    all          ceph manager modules which are always enabled
> ii  ceph-mon                                15.2.4-1~bpo10+1
>    amd64        monitor server for the ceph storage system
> ii  ceph-osd                                15.2.4-1~bpo10+1
>    amd64        OSD server for the ceph storage system
> ii  cephadm                                 15.2.4-1~bpo10+1
>    amd64        cephadm utility to bootstrap ceph daemons with systemd
> and containers
> ii  libcephfs1                              10.2.11-2
>    amd64        Ceph distributed file system client library
> ii  libcephfs2                              15.2.4-1~bpo10+1
>    amd64        Ceph distributed file system client library
> ii  python-ceph-argparse                    14.2.8-1
>    all          Python 2 utility libraries for Ceph CLI
> ii  python3-ceph-argparse                   15.2.4-1~bpo10+1
>    all          Python 3 utility libraries for Ceph CLI
> ii  python3-ceph-common                     15.2.4-1~bpo10+1
>    all          Python 3 utility libraries for Ceph
> ii  python3-cephfs                          15.2.4-1~bpo10+1
>    amd64        Python 3 libraries for the Ceph libcephfs library
> 
> root@ceph01:~# ceph -s
>   cluster:
>     id:     5436dd5d-83d4-4dc8-a93b-60ab5db145df
>     health: HEALTH_WARN
>             6 hosts fail cephadm check
>             failed to probe daemons or devices
>             7 nearfull osd(s)
>             Reduced data availability: 1 pg inactive
>             Low space hindering backfill (add storage if this doesn't
> resolve itself): 26 pgs backfill_toofull
>             Degraded data redundancy: 202495/33226941 objects degraded
> (0.609%), 26 pgs degraded, 26 pgs undersized
>             3 pool(s) nearfull
> 
>   services:
>     mon: 3 daemons, quorum ceph01,ceph02,ceph03 (age 39m)
>     mgr: ceph02(active, since 77m), standbys: ceph03, ceph01
>     mds:  2 up:standby
>     osd: 61 osds: 56 up (since 41m), 55 in (since 41m); 27 remapped pgs
> 
>   data:
>     pools:   3 pools, 2049 pgs
>     objects: 11.08M objects, 37 TiB
>     usage:   113 TiB used, 28 TiB / 141 TiB avail
>     pgs:     0.049% pgs not active
>              202495/33226941 objects degraded (0.609%)
>              9238/33226941 objects misplaced (0.028%)
>              1025 active+clean
>              887  active+clean+snaptrim_wait
>              110  active+clean+snaptrim
>              25   active+undersized+degraded+remapped+backfill_toofull
>              1    undersized+degraded+remapped+backfill_toofull+peered
>              1    active+remapped+backfilling
> 
>   io:
>     client:   1.0 KiB/s rd, 140 KiB/s wr, 0 op/s rd, 1 op/s wr
>     recovery: 30 MiB/s, 8 objects/s
> 
> I already restarted the mgr on ceph02.
> 
> Thanks,
> Michael
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
> 

-- 
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg). Geschäftsführer: Felix Imendörffer

Attachment:
signature.asc

Description: OpenPGP digital signature
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx