Re: [cephadm] Point release minor updates block themselves infinitely

Paul Browne <pfb29@xxxxxxxxx> · Mon, 11 Jan 2021 12:16:21 +0000

Next thing I've tried is taking a low-impact host and purging all
Ceph/Podman state from it to re-install it from scratch (a Rados GW
instance, in this case).

But now seeing this strange error in just re-adding the host via a "ceph
orch host add" , at the point where a disk inventory is attempted to be
taken ;

2021-01-11 12:01:16,028 DEBUG Running command: /bin/podman run --rm
--ipc=host --net=host --entrypoint /usr/sbin/ceph-volume --privileged
--group-add=disk -e CONTAINER_IMAGE=docker.io/ceph/ceph:v15.2.7 -e
NODE_NAME=ceph02-rgw-01 -v
/var/run/ceph/fbbe7cac-3324-11eb-8186-34800d5b932c:/var/run/ceph:z -v
/var/log/ceph/fbbe7cac-3324-11eb-8186-34800d5b932c:/var/log/ceph:z -v
/var/lib/ceph/fbbe7cac-3324-11eb-8186-34800d5b932c/crash:/var/lib/ceph/crash:z
-v /dev:/dev -v /run/udev:/run/udev -v /sys:/sys -v /run/lvm:/run/lvm -v
/run/lock/lvm:/run/lock/lvm docker.io/ceph/ceph:v15.2.7 inventory
--format=json --filter-for-batch
2021-01-11 12:01:16,518 INFO /bin/podman:stderr usage: ceph-volume
inventory [-h] [--format {plain,json,json-pretty}] [path]
2021-01-11 12:01:16,519 INFO /bin/podman:stderr ceph-volume inventory:
error: unrecognized arguments: --filter-for-batch

Deployment of RGW container to the host is similarly blocked again.

This seems to be covered under this issue;
https://tracker.ceph.com/issues/48694 for ceph-volume, but may not
have been addressed as yet...

On Mon, 11 Jan 2021 at 10:36, Paul Browne <pfb29@xxxxxxxxx> wrote:

> Hello all,
>
> I've been having some real troubles in getting cephadm to apply some very
> minor point release updates cleanly, twice now applying the point update of
> 15.2.6 -> 15.2.7 and 15.2.7 to 15.2.8 has gotten blocked somewhere and
> ended up making no progress, requiring digging deep into internals to
> unblock things.
>
> In the most recent attempt of 15.2.7 -> 15.2.8, the Orchestrator cleanly
> replaced Mon and MGR containers in the first steps, but when it came to
> replacing Crash daemon containers the running 15.2.7 Crash container was
> purged but container update operations then seem to get blocked on trying
> to start it again on the older image, leading to an infinite loop of Podman
> trying to start a non-existent container in logging;
>
> https://pastebin.com/9zdMs1XU
>
> Forcing an `ceph orch daemon rm` of the Crash daemon affected for the host
> just repeats the loop again.
>
> I'd then tried removing the Crash service and all daemons through the
> Orchestrator API next.
>
> This purged all running Crash containers from all hosts, and then
> re-applyed a service spec to restart them, hopefully on the new image.
>
> The Orchestrator removal of the Crash containers seems to have left
> container state dangling on hosts however, as now we see the same issue of
> Crash containers not starting on *every* host in the cluster due to
> left-over container state ;
>
> https://pastebin.com/tjaegxqg
>
> At this point I'm not certain if Podman (v1.6.4 EPEL, CentOS7.9) or
> Orchestrator is to blame for leaving this state dangling and blocking new
> container creation, but it's proving a real problem in applying even simple
> minor version point updates.
>
> Has anyone else been seeing similar behaviour in applying minor version
> updates via cephadm+Orchestrator? Are there any good workarounds to clean
> up the dangling container state?
>
> --
> *******************
> Paul Browne
> Research Computing Platforms
> University Information Services
> Roger Needham Building
> JJ Thompson Avenue
> University of Cambridge
> Cambridge
> United Kingdom
> E-Mail: pfb29@xxxxxxxxx
> Tel: 0044-1223-746548
> *******************
>

-- 
*******************
Paul Browne
Research Computing Platforms
University Information Services
Roger Needham Building
JJ Thompson Avenue
University of Cambridge
Cambridge
United Kingdom
E-Mail: pfb29@xxxxxxxxx
Tel: 0044-1223-746548
*******************
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx