Hi,
Just to note this:
ceph-volume activate takes time to complete
https://tracker.ceph.com/issues/57627
...is a show stopper bug for me in 16.2.11 when trying to upgrade from
16.2.9 - in particular to get this fix:
Pacific: Significant write amplification as compared to Nautilus
https://tracker.ceph.com/issues/58530
The upgrade to 16.2.11 stopped with:
$ ceph orch upgrade status
{
"target_image": "quay.io/ceph/ceph@sha256:748387ea347157fb9df9bb2620d873ac633ff80d0308bcc82a74a821df0d0cfa",
"in_progress": true,
"which": "Upgrading all daemon types on all hosts",
"services_complete": [
"mon",
"mgr"
],
"progress": "10/90 daemons upgraded",
"message": "Error: UPGRADE_REDEPLOY_DAEMON: Upgrading daemon osd.24 on host b2 failed.",
"is_paused": true
}
Likely because that "b2" host is getting bitten VERY badly by the
"ceph-volume activate takes time to complete" problem due to a large
number of block devices on the system:
b2$ lsblk -P -p -o 'NAME' | wc -l
924
Attempting to start the affected osd via systemd was failing due to timing out.
I tried manually starting the osd per it's unit.run, but the "ceph-volume
activate" step was running for over an hour before I gave up.
I've been able to manually revert this particular OSD (the first one to be
updated on this particular box) back to 16.2.9 by updating it's unit.run
file and restarting the osd, so my cluster is healthy.
I see the fix has been backported:
https://tracker.ceph.com/issues/58790
I'm guessing it shouldn't be too much of a problem running mixed versions
for a while until 16.2.12 comes out?
$ ceph versions
{
"mon": {
"ceph version 16.2.9 (4c3647a322c0ff5a1dd2344e039859dcbd28c830) pacific (stable)": 5
},
"mgr": {
"ceph version 16.2.9 (4c3647a322c0ff5a1dd2344e039859dcbd28c830) pacific (stable)": 3
},
"osd": {
"ceph version 16.2.11 (3cf40e2dca667f68c6ce3ff5cd94f01e711af894) pacific (stable)": 2,
"ceph version 16.2.9 (4c3647a322c0ff5a1dd2344e039859dcbd28c830) pacific (stable)": 79
},
"mds": {},
"overall": {
"ceph version 16.2.11 (3cf40e2dca667f68c6ce3ff5cd94f01e711af894) pacific (stable)": 2,
"ceph version 16.2.9 (4c3647a322c0ff5a1dd2344e039859dcbd28c830) pacific (stable)": 87
}
}
Cheers,
Chris