Re: PGs unknown (osd down) after conversion to cephadm

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Sebastian,

 

of course! I misspelled the option. Sometimes it’s difficult to see the forest for the trees…

 

But after upgrade to 15.2.1 I have now the CEPHADM_STRAY_HOST problem:

 

HEALTH_WARN 3 stray host(s) with 15 daemon(s) not managed by cephadm

[WRN] CEPHADM_STRAY_HOST: 3 stray host(s) with 15 daemon(s) not managed by cephadm

    stray host ceph1 has 5 stray daemons: ['mds.media.ceph1.xzotzy', 'mgr.ceph1', 'mon.ceph1', 'osd.0', 'osd.1']

    stray host ceph2 has 5 stray daemons: ['mds.media.ceph2.bitmic', 'mgr.ceph2', 'mon.ceph2', 'osd.2', 'osd.3']

    stray host ceph3 has 5 stray daemons: ['mds.media.ceph3.rlxujb', 'mgr.ceph3', 'mon.ceph3', 'osd.4', 'osd.5']

 

Maybe related to the hostname vs. FQDN mismatch issue?

 

My mon metadata (for one node):

 

        "name": "ceph1",

        "addrs": "[v2:10.10.0.10:3300/0,v1:10.10.0.10:6789/0]",

        "arch": "x86_64",

        "ceph_release": "octopus",

        "ceph_version": "ceph version 15.2.1 (9fd2f65f91d9246fae2c841a6222d34d121680ee) octopus (stable)",

        "ceph_version_short": "15.2.1",

        "compression_algorithms": "none, snappy, zlib, zstd, lz4",

        "container_hostname": "ceph1.domainname.de",

        "container_image": "ceph/ceph:v15.2.1",

        "cpu": "Intel(R) Xeon(R) CPU E5-2630 v2 @ 2.60GHz",

        "device_ids": "sda=INTEL_SSDSC2KB480G8_PHYF924001VB480BGN",

        "device_paths": "sda=/dev/disk/by-path/pci-0000:00:1f.2-ata-1",

        "devices": "sda",

        "distro": "centos",

        "distro_description": "CentOS Linux 8 (Core)",

        "distro_version": "8",

        "hostname": "ceph1",

        "kernel_description": "#201910180137 SMP Fri Oct 18 01:40:58 UTC 2019",

        "kernel_version": "4.19.80-041980-generic",

        "mem_swap_kb": "0",

        "mem_total_kb": "65936872",

        "os": "Linux"

 

and again the output of ceph orch host ls:

 

HOST             ADDR             LABELS  STATUS

ceph1.domainname.de  ceph1.domainname.de

ceph2.domainname.de  ceph2.domainname.de

ceph3.domainname.de  ceph3.domainname.de

 

Thx,

 

Marco

 

 

Von: Sebastian Wagner
Gesendet: Donnerstag, 16. April 2020 21:23
An: Dr. Marco Savoca; ceph-users@xxxxxxx
Betreff: Re: Re: PGs unknown (osd down) after conversion to cephadm

 

Hi Marco,

 

# ceph orch upgrade start --ceph-version 15.2.1

 

should do the trick.

 

 

 

Am 15.04.20 um 17:40 schrieb Dr. Marco Savoca:

> Hi Sebastian,

>

>  

>

> as I said, the orchestrator does not seem to be reachable after

> cluster’s reboot. The requested output could only be gathered after

> manual restart of the osd containers. By the way, if I try to upgrade to

> v15.2.1 via cephadm (ceph orch upgrade start --version 15.2.1), I only

> get the output “ceph version 15.2.0

> (dc6a0b5c3cbf6a5e1d6d4f20b5ad466d76b96247) octopus (rc)” and the upgrade

> does not start:

>

> sudo ceph orch upgrade status

>

> {

>

>     "target_image": null,

>

>     "in_progress": false,

>

>     "services_complete": [],

>

>     "message": ""

>

> }

>

>  

>

> Maybe it’s time to open a ticket.

>

>  

>

> Here the requested outputs.

>

>  

>

> sudo ceph orch host ls --format json

>

>  

>

> [{"addr": "ceph1.domainname.de", "hostname": "ceph1.domainname.de",

> "labels": [], "status": ""}, {"addr": "ceph2.domainname.de", "hostname":

> "ceph2.domainname.de", "labels": [], "status": ""}, {"addr":

> "ceph3.domainname.de", "hostname": "ceph3.domainname.de", "labels": [],

> "status": ""}]

>

>  

>

> sudo ceph orch ls --format json

>

>  

>

> [{"container_image_id":

> "204a01f9b0b6710dd0c0af7f37ce7139c47ff0f0105d778d7104c69282dfbbf1",

> "container_image_name": "docker.io/ceph/ceph:v15", "service_name":

> "mds.media", "size": 3, "running": 3, "spec": {"placement": {"count":

> 3}, "service_type": "mds", "service_id": "media"}, "last_refresh":

> "2020-04-15T15:26:53.664473", "created": "2020-03-30T23:51:32.239555"},

> {"container_image_id":

> "204a01f9b0b6710dd0c0af7f37ce7139c47ff0f0105d778d7104c69282dfbbf1",

> "container_image_name": "docker.io/ceph/ceph:v15", "service_name":

> "mgr", "size": 0, "running": 3, "last_refresh":

> "2020-04-15T15:26:53.664098"}, {"container_image_id":

> "204a01f9b0b6710dd0c0af7f37ce7139c47ff0f0105d778d7104c69282dfbbf1",

> "container_image_name": "docker.io/ceph/ceph:v15", "service_name":

> "mon", "size": 0, "running": 3, "last_refresh":

> "2020-04-15T15:26:53.664270"}]

>

>  

>

> Thanks,

>

>  

>

> Marco

>

>  

>

>  

>

> *Von: *Sebastian Wagner <mailto:swagner@xxxxxxxx>

> *Gesendet: *Dienstag, 14. April 2020 16:53

> *An: *ceph-users@xxxxxxx <mailto:ceph-users@xxxxxxx>

> *Betreff: * Re: PGs unknown (osd down) after conversion to

> cephadm

>

>  

>

> Might be an issue with cephadm.

>

>  

>

> Do you have the output of `ceph orch host ls --format json` and `ceph

>

> orch ls --format json`?

>

>  

>

> Am 09.04.20 um 13:23 schrieb Dr. Marco Savoca:

>

>> Hi all,

>

>> 

>

>>  

>

>> 

>

>> last week I successfully upgraded my cluster to Octopus and converted it

>

>> to cephadm. The conversion process (according to the docs) went well and

>

>> the cluster ran in an active+clean status.

>

>> 

>

>>  

>

>> 

>

>> But after a reboot all osd went down with a delay of a couple of minutes

>

>> after reboot and all (100%) of the PGs ran into the unknown state. The

>

>> orchestrator isn’t reacheable during this state (ceph orch status

>

>> doesn’t come to an end).

>

>> 

>

>>  

>

>> 

>

>> A manual restart of the osd-daemons resolved the problem and the cluster

>

>> is now active+clean again.

>

>> 

>

>>  

>

>> 

>

>> This behavior is reproducible.

>

>> 

>

>>  

>

>> 

>

>>  

>

>> 

>

>> The “ceph log last cephadm” command spits out (redacted):

>

>> 

>

>>  

>

>> 

>

>>  

>

>> 

>

>> 2020-03-30T23:07:06.881061+0000 mgr.ceph2 (mgr.1854484) 42 : cephadm

>

>> [INF] Generating ssh key...

>

>> 

>

>> 2020-03-30T23:22:00.250422+0000 mgr.ceph2 (mgr.1854484) 492 : cephadm

>

>> [ERR] _Promise failed

>

>> 

>

>> Traceback (most recent call last):

>

>> 

>

>>   File "/usr/share/ceph/mgr/cephadm/module.py", line 444, in do_work

>

>> 

>

>>     res = self._on_complete_(*args, **kwargs)

>

>> 

>

>>   File "/usr/share/ceph/mgr/cephadm/module.py", line 512, in <lambda>

>

>> 

>

>>     return cls(_on_complete_=lambda x: f(*x), value=args, name=name,

>

>> **c_kwargs)

>

>> 

>

>>   File "/usr/share/ceph/mgr/cephadm/module.py", line 1648, in add_host

>

>> 

>

>>     spec.hostname, spec.addr, err))

>

>> 

>

>> orchestrator._interface.OrchestratorError: New host ceph1 (ceph1) failed

>

>> check: ['INFO:cephadm:podman|docker (/usr/bin/docker) is present',

>

>> 'INFO:cephadm:systemctl is present', 'INFO:cephadm:lvcreate is present',

>

>> 'INFO:cephadm:Unit systemd-timesyncd.service is enabled and running',

>

>> 'ERROR: hostname "ceph1.domain.de" does not match expected hostname

>

>> "ceph1"']

>

>> 

>

>> 2020-03-30T23:22:27.267344+0000 mgr.ceph2 (mgr.1854484) 508 : cephadm

>

>> [INF] Added host ceph1.domain.de

>

>> 

>

>> 2020-03-30T23:22:36.078462+0000 mgr.ceph2 (mgr.1854484) 515 : cephadm

>

>> [INF] Added host ceph2.domain.de

>

>> 

>

>> 2020-03-30T23:22:55.200280+0000 mgr.ceph2 (mgr.1854484) 527 : cephadm

>

>> [INF] Added host ceph3.domain.de

>

>> 

>

>> 2020-03-30T23:23:17.491596+0000 mgr.ceph2 (mgr.1854484) 540 : cephadm

>

>> [ERR] _Promise failed

>

>> 

>

>> Traceback (most recent call last):

>

>> 

>

>>   File "/usr/share/ceph/mgr/cephadm/module.py", line 444, in do_work

>

>> 

>

>>     res = self._on_complete_(*args, **kwargs)

>

>> 

>

>>   File "/usr/share/ceph/mgr/cephadm/module.py", line 512, in <lambda>

>

>> 

>

>>     return cls(_on_complete_=lambda x: f(*x), value=args, name=name,

>

>> **c_kwargs)

>

>> 

>

>>   File "/usr/share/ceph/mgr/cephadm/module.py", line 1648, in add_host

>

>> 

>

>>     spec.hostname, spec.addr, err))

>

>> 

>

>> orchestrator._interface.OrchestratorError: New host ceph1 (10.10.0.10)

>

>> failed check: ['INFO:cephadm:podman|docker (/usr/bin/docker) is

>

>> present', 'INFO:cephadm:systemctl is present', 'INFO:cephadm:lvcreate is

>

>> present', 'INFO:cephadm:Unit systemd-timesyncd.service is enabled and

>

>> running', 'ERROR: hostname "ceph1.domain.de" does not match expected

>

>> hostname "ceph1"']

>

>> 

>

>>  

>

>> 

>

>> Could this be a problem with the ssh key?

>

>> 

>

>>  

>

>> 

>

>> Thanks for the help and happy eastern.

>

>> 

>

>>  

>

>> 

>

>> Marco Savoca

>

>> 

>

>>  

>

>> 

>

>> 

>

>> _______________________________________________

>

>> ceph-users mailing list -- ceph-users@xxxxxxx

>

>> To unsubscribe send an email to ceph-users-leave@xxxxxxx

>

>> 

>

>  

>

> --

>

> SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany

>

> (HRB 36809, AG Nürnberg). Geschäftsführer: Felix Imendörffer

>

>  

>

>  

>

 

--

SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany

(HRB 36809, AG Nürnberg). Geschäftsführer: Felix Imendörffer

 

 

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux