Re: PGs unknown (osd down) after conversion to cephadm

Sebastian Wagner <swagner@xxxxxxxx> · Thu, 16 Apr 2020 21:23:26 +0200

Hi Marco,

# ceph orch upgrade start --ceph-version 15.2.1

should do the trick.

Am 15.04.20 um 17:40 schrieb Dr. Marco Savoca:
> Hi Sebastian,
> 
>  
> 
> as I said, the orchestrator does not seem to be reachable after
> cluster’s reboot. The requested output could only be gathered after
> manual restart of the osd containers. By the way, if I try to upgrade to
> v15.2.1 via cephadm (ceph orch upgrade start --version 15.2.1), I only
> get the output “ceph version 15.2.0
> (dc6a0b5c3cbf6a5e1d6d4f20b5ad466d76b96247) octopus (rc)” and the upgrade
> does not start:
> 
> sudo ceph orch upgrade status
> 
> {
> 
>     "target_image": null,
> 
>     "in_progress": false,
> 
>     "services_complete": [],
> 
>     "message": ""
> 
> }
> 
>  
> 
> Maybe it’s time to open a ticket.
> 
>  
> 
> Here the requested outputs.
> 
>  
> 
> sudo ceph orch host ls --format json
> 
>  
> 
> [{"addr": "ceph1.domainname.de", "hostname": "ceph1.domainname.de",
> "labels": [], "status": ""}, {"addr": "ceph2.domainname.de", "hostname":
> "ceph2.domainname.de", "labels": [], "status": ""}, {"addr":
> "ceph3.domainname.de", "hostname": "ceph3.domainname.de", "labels": [],
> "status": ""}]
> 
>  
> 
> sudo ceph orch ls --format json
> 
>  
> 
> [{"container_image_id":
> "204a01f9b0b6710dd0c0af7f37ce7139c47ff0f0105d778d7104c69282dfbbf1",
> "container_image_name": "docker.io/ceph/ceph:v15", "service_name":
> "mds.media", "size": 3, "running": 3, "spec": {"placement": {"count":
> 3}, "service_type": "mds", "service_id": "media"}, "last_refresh":
> "2020-04-15T15:26:53.664473", "created": "2020-03-30T23:51:32.239555"},
> {"container_image_id":
> "204a01f9b0b6710dd0c0af7f37ce7139c47ff0f0105d778d7104c69282dfbbf1",
> "container_image_name": "docker.io/ceph/ceph:v15", "service_name":
> "mgr", "size": 0, "running": 3, "last_refresh":
> "2020-04-15T15:26:53.664098"}, {"container_image_id":
> "204a01f9b0b6710dd0c0af7f37ce7139c47ff0f0105d778d7104c69282dfbbf1",
> "container_image_name": "docker.io/ceph/ceph:v15", "service_name":
> "mon", "size": 0, "running": 3, "last_refresh":
> "2020-04-15T15:26:53.664270"}]
> 
>  
> 
> Thanks,
> 
>  
> 
> Marco
> 
>  
> 
>  
> 
> *Von: *Sebastian Wagner <mailto:swagner@xxxxxxxx>
> *Gesendet: *Dienstag, 14. April 2020 16:53
> *An: *ceph-users@xxxxxxx <mailto:ceph-users@xxxxxxx>
> *Betreff: * Re: PGs unknown (osd down) after conversion to
> cephadm
> 
>  
> 
> Might be an issue with cephadm.
> 
>  
> 
> Do you have the output of `ceph orch host ls --format json` and `ceph
> 
> orch ls --format json`?
> 
>  
> 
> Am 09.04.20 um 13:23 schrieb Dr. Marco Savoca:
> 
>> Hi all,
> 
>>
> 
>>  
> 
>>
> 
>> last week I successfully upgraded my cluster to Octopus and converted it
> 
>> to cephadm. The conversion process (according to the docs) went well and
> 
>> the cluster ran in an active+clean status.
> 
>>
> 
>>  
> 
>>
> 
>> But after a reboot all osd went down with a delay of a couple of minutes
> 
>> after reboot and all (100%) of the PGs ran into the unknown state. The
> 
>> orchestrator isn’t reacheable during this state (ceph orch status
> 
>> doesn’t come to an end).
> 
>>
> 
>>  
> 
>>
> 
>> A manual restart of the osd-daemons resolved the problem and the cluster
> 
>> is now active+clean again.
> 
>>
> 
>>  
> 
>>
> 
>> This behavior is reproducible.
> 
>>
> 
>>  
> 
>>
> 
>>  
> 
>>
> 
>> The “ceph log last cephadm” command spits out (redacted):
> 
>>
> 
>>  
> 
>>
> 
>>  
> 
>>
> 
>> 2020-03-30T23:07:06.881061+0000 mgr.ceph2 (mgr.1854484) 42 : cephadm
> 
>> [INF] Generating ssh key...
> 
>>
> 
>> 2020-03-30T23:22:00.250422+0000 mgr.ceph2 (mgr.1854484) 492 : cephadm
> 
>> [ERR] _Promise failed
> 
>>
> 
>> Traceback (most recent call last):
> 
>>
> 
>>   File "/usr/share/ceph/mgr/cephadm/module.py", line 444, in do_work
> 
>>
> 
>>     res = self._on_complete_(*args, **kwargs)
> 
>>
> 
>>   File "/usr/share/ceph/mgr/cephadm/module.py", line 512, in <lambda>
> 
>>
> 
>>     return cls(on_complete=lambda x: f(*x), value=args, name=name,
> 
>> **c_kwargs)
> 
>>
> 
>>   File "/usr/share/ceph/mgr/cephadm/module.py", line 1648, in add_host
> 
>>
> 
>>     spec.hostname, spec.addr, err))
> 
>>
> 
>> orchestrator._interface.OrchestratorError: New host ceph1 (ceph1) failed
> 
>> check: ['INFO:cephadm:podman|docker (/usr/bin/docker) is present',
> 
>> 'INFO:cephadm:systemctl is present', 'INFO:cephadm:lvcreate is present',
> 
>> 'INFO:cephadm:Unit systemd-timesyncd.service is enabled and running',
> 
>> 'ERROR: hostname "ceph1.domain.de" does not match expected hostname
> 
>> "ceph1"']
> 
>>
> 
>> 2020-03-30T23:22:27.267344+0000 mgr.ceph2 (mgr.1854484) 508 : cephadm
> 
>> [INF] Added host ceph1.domain.de
> 
>>
> 
>> 2020-03-30T23:22:36.078462+0000 mgr.ceph2 (mgr.1854484) 515 : cephadm
> 
>> [INF] Added host ceph2.domain.de
> 
>>
> 
>> 2020-03-30T23:22:55.200280+0000 mgr.ceph2 (mgr.1854484) 527 : cephadm
> 
>> [INF] Added host ceph3.domain.de
> 
>>
> 
>> 2020-03-30T23:23:17.491596+0000 mgr.ceph2 (mgr.1854484) 540 : cephadm
> 
>> [ERR] _Promise failed
> 
>>
> 
>> Traceback (most recent call last):
> 
>>
> 
>>   File "/usr/share/ceph/mgr/cephadm/module.py", line 444, in do_work
> 
>>
> 
>>     res = self._on_complete_(*args, **kwargs)
> 
>>
> 
>>   File "/usr/share/ceph/mgr/cephadm/module.py", line 512, in <lambda>
> 
>>
> 
>>     return cls(on_complete=lambda x: f(*x), value=args, name=name,
> 
>> **c_kwargs)
> 
>>
> 
>>   File "/usr/share/ceph/mgr/cephadm/module.py", line 1648, in add_host
> 
>>
> 
>>     spec.hostname, spec.addr, err))
> 
>>
> 
>> orchestrator._interface.OrchestratorError: New host ceph1 (10.10.0.10)
> 
>> failed check: ['INFO:cephadm:podman|docker (/usr/bin/docker) is
> 
>> present', 'INFO:cephadm:systemctl is present', 'INFO:cephadm:lvcreate is
> 
>> present', 'INFO:cephadm:Unit systemd-timesyncd.service is enabled and
> 
>> running', 'ERROR: hostname "ceph1.domain.de" does not match expected
> 
>> hostname "ceph1"']
> 
>>
> 
>>  
> 
>>
> 
>> Could this be a problem with the ssh key?
> 
>>
> 
>>  
> 
>>
> 
>> Thanks for the help and happy eastern.
> 
>>
> 
>>  
> 
>>
> 
>> Marco Savoca
> 
>>
> 
>>  
> 
>>
> 
>>
> 
>> _______________________________________________
> 
>> ceph-users mailing list -- ceph-users@xxxxxxx
> 
>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
> 
>>
> 
>  
> 
> -- 
> 
> SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
> 
> (HRB 36809, AG Nürnberg). Geschäftsführer: Felix Imendörffer
> 
>  
> 
>  
> 

-- 
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg). Geschäftsführer: Felix Imendörffer

Attachment:
signature.asc

Description: OpenPGP digital signature
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx