I was upgrading a really old cluster from Infernalis (9.2.1) to Jewel (10.2.3) and got some weird, but interesting issues. This cluster started its life with Bobtail -> Dumpling -> Emperor -> Firefly -> Giant -> Hammer -> Infernalis and now Jewel. When I upgraded the first MON (out of 3) everything just worked as it should. Upgraded the second and the first and second crashed. Reverted binaries on one of them to Infernalis, deleted the store.db folder in the other one, started as Jewel (now had 2x Infernalis and 1x Jewel) and let it sync the store. Upgraded the other nodes and every thing was fine. Or so it mostly seems. Other than the usual "failed to encode map xxx with expected crc". I had some weird size graphs in calamari, and looking closer (ceph df) I got: GLOBAL: SIZE AVAIL RAW USED %RAW USED 10E 932P 5E 52.46 oooh I got a really big cluster, it's usually a lot smaller (size is 655T). a snippet cut from "ceph -s" health HEALTH_ERR 1 full osd(s) flags full pgmap v77393779: 6384 pgs, 26 pools, 66584 GB data, 52605 kobjects 5502 PB used, 17316 PB / 10488 PB avail health detail shows: osd.89 is full at 266% that is one of the OSD's that's being upgraded... The cluster ends up recovering by its own, and showing the regular sane values... But this does seem to indicate some sort of underlying issue.... has anyone seen such an issue? _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com