This sounds similar to something I saw once with an upgrade from 17.2.0 to 17.2.1 (that I failed to reproduce). In that case, what fixed it was stopping the upgrade, manually redeploying both mgr daemons with the new version ("ceph orch daemon redeploy <standby-mgr-daemon-name> --image <image-for-version-upgrading-to>", wait a few minutes for the redeploy to happen, "ceph mgr fail", wait a minute, same redeploy command but for the other mgr). After doing that and starting the upgrade up again it seemed to go okay. Also, I'd recommend using "--image" for the upgrade command over "--ceph-version". Somebody else also had an upgrade issue and also happened to be using that "--ceph-version" flag https://tracker.ceph.com/issues/56485 which has got me wondering if there's a bug with it. On Fri, Jul 8, 2022 at 10:37 AM Stéphane Caminade < stephane.caminade@xxxxxxxxxxxxx> wrote: > Hi, > > I'm still a little stuck with this situation, no clues? > > Regards, > > Stephane > > Le 29/06/2022 à 10:34, Stéphane Caminade a écrit : > > Dear list, > > > > After an upgrade from package-based cluster running 16.2.9, to cephadm > > with docker (following > > https://docs.ceph.com/en/pacific/cephadm/adoption/), I have a strange > > discrepancy between the running versions: > > > > /ceph versions// > > //{// > > // "mon": {// > > // "ceph version 16.2.9 > > (4c3647a322c0ff5a1dd2344e039859dcbd28c830) pacific (stable)": 3// > > // },// > > // "mgr": {// > > // "ceph version 16.2.9 > > (4c3647a322c0ff5a1dd2344e039859dcbd28c830) pacific (stable)": 2// > > // },// > > // "osd": {// > > // "ceph version 16.2.9 > > (4c3647a322c0ff5a1dd2344e039859dcbd28c830) pacific (stable)": 181// > > // },// > > // "mds": {// > > // "ceph version 16.2.5-387-g7282d81d > > (7282d81d2c500b5b0e929c07971b72444c6ac424) pacific (stable)": 3// > > // },// > > // "overall": {// > > // "ceph version 16.2.5-387-g7282d81d > > (7282d81d2c500b5b0e929c07971b72444c6ac424) pacific (stable)": 3,// > > // "ceph version 16.2.9 > > (4c3647a322c0ff5a1dd2344e039859dcbd28c830) pacific (stable)": 186// > > // }// > > //}// > > / > > > > I tried asking cephadm to upgrade to 16.2.9 ( /ceph orch upgrade start > > --ceph-version 16.2.9/ ), but it only seems to cycle between the > > active managers (about every 15 to 20s), without doing anything more. > > Here is a part of the logs from one of the MGR: > > > > /7f599cbe9700 0 [cephadm INFO cephadm.upgrade] Upgrade: Need to > > upgrade myself (mgr.inf-ceph-mds)// > > //7f599cbe9700 0 log_channel(cephadm) log [INF] : Upgrade: Need to > > upgrade myself (mgr.inf-ceph-mds)// > > //7f599cbe9700 0 [cephadm INFO cephadm.services.cephadmservice] > > Failing over to other MGR// > > //7f599cbe9700 0 log_channel(cephadm) log [INF] : Failing over to > > other MGR// > > / > > > > Something strange as well, it seems that it is looking for more > > daemons (191) to upgrade than the 189 (from 186 in 16.2.9 and 3 in > > 16.2.5-387-nnnn): > > > > /ceph orch upgrade status// > > //{// > > // "target_image": > > " > quay.io/ceph/ceph@sha256:5d3c9f239598e20a4ed9e08b8232ef653f5c3f32710007b4cabe4bd416bebe54 > ",// > > // "in_progress": true,// > > // "services_complete": [// > > // "mgr",// > > // "mon"// > > // ],// > > // "progress": "187/191 daemons upgraded",// > > // "message": ""// > > //}/ > > > > So I have two questions: > > > > 1. Do you have any pointers as to where I could look for information > > on what is going on (or not, actually)? > > > > 2. Would it be safe to stop the upgrade, and ask it to safely move to > > 17.2.1 instead? > > > > Best regards, > > > > Stephane > > _______________________________________________ > > ceph-users mailing list -- ceph-users@xxxxxxx > > To unsubscribe send an email to ceph-users-leave@xxxxxxx > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx