'ceph orch upgrade...' causes an rbd outage on a proxmox cluster

Pierre BELLEMAIN <pierre.bellemain@xxxxxxxxxxxxxx> · Thu, 2 Feb 2023 18:37:19 +0100 (CET)

Hi everyone, 
(sorry for the spam, apparently I was not subscribed to the ml) 

I have a ceph test cluster and a proxmox test cluster (for try upgrade in test before the prod). 
My ceph cluster is made up of three servers running debian 11, with two separate networks (cluster_network and public_network, in VLANs). 
In ceph version 16.2.10 (cephadm with docker). 
Each server has one MGR, one MON and 8 OSDs. 
cluster: 
id: xxx 
health: HEALTH_OK 

services: 
mon: 3 daemons, quorum ceph01,ceph03,ceph02 (age 2h) 
mgr: ceph03(active, since 77m), standbys: ceph01, ceph02 
osd: 24 osds: 24 up (since 7w), 24 in (since 6M) 

data: 
pools: 3 pools, 65 pgs 
objects: 29.13k objects, 113 GiB 
usage: 344 GiB used, 52 TiB / 52 TiB avail 
pgs: 65 active+clean 

io: 
client: 1.3 KiB/s wr, 0 op/s rd, 0 op/s wr 

The proxmox cluster is also made up of 3 servers running proxmox 7.2-7 (with proxmox ceph pacific which is on 16.2.9 version). The ceph storage used is RBD (on the ceph public_network). I added the RBD datastores simply via the GUI. 

So far so good. I have several VMs, on each of the proxmox. 

When I update ceph to 16.2.11, that's where things go wrong. 
I don't like when the update does everything for me without control, so I did a "staggered upgrade", following the official procedure (https://docs.ceph.com/en/pacific/cephadm/upgrade/#staggered-upgrade). As the version I'm starting from doesn't support staggered upgrade, I follow the procedure (https://docs.ceph.com/en/pacific/cephadm/upgrade/#upgrading-to-a-version-that-supports-staggered-upgrade-from-one-that-doesn-t). 
When I do the "ceph orch redeploy" of the two standby MGRs, everything is fine. 
I do the "sudo ceph mgr fail", everything is fine (it switches well to an mgr which was standby, so I get an MGR 16.2.11). 
However, when I do the "sudo ceph orch upgrade start --image quay.io/ceph/ceph:v16.2.11 --daemon-types mgr", it updates me the last MGR which was not updated (so far everything is still fine), but it does a last restart of all the MGRs to finish, and there, the proxmox visibly loses the RBD and turns off all my VMs. 
Here is the message in the proxmox syslog: 
Feb 2 16:20:52 pmox01 QEMU[436706]: terminate called after throwing an instance of 'std::system_error' 
Feb 2 16:20:52 pmox01 QEMU[436706]: what(): Resource deadlock avoided 
Feb 2 16:20:52 pmox01 kernel: [17038607.686686] vmbr0: port 2(tap102i0) entered disabled state 
Feb 2 16:20:52 pmox01 kernel: [17038607.779049] vmbr0: port 2(tap102i0) entered disabled state 
Feb 2 16:20:52 pmox01 systemd[1]: 102.scope: Succeeded. 
Feb 2 16:20:52 pmox01 systemd[1]: 102.scope: Consumed 43.136s CPU time. 
Feb 2 16:20:53 pmox01 qmeventd[446872]: Starting cleanup for 102 
Feb 2 16:20:53 pmox01 qmeventd[446872]: Finished cleanup for 102 

For ceph, everything is fine, it does the update, and tells me everything is OK in the end. 
Ceph is now on 16.2.11 and the health is OK. 

When I redo a downgrade of the MGRs, I have the problem again and when I start the procedure again, I still have the problem. It's very reproducible. 
According to my tests, the "sudo ceph orch upgrade" command always gives me trouble, even when trying a real staggered upgrade from and to version 16.2.11 with the command: 
sudo ceph orch upgrade start --image quay.io/ceph/ceph:v16.2.11 --daemon-types mgr --hosts ceph01 --limit 1 

Does anyone have an idea? 

Thank you everyone ! 
Pierre. 
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx