Hi All. I was on luminous 12.2.0 as I do *not* enable repo updates for critical software (e.g. openstack / ceph). Upgrades need to occur on an intentional basis! So I first have upgraded to luminous 12.2.11 following the guide and release notes. [root@lvtncephx110 ~]# ceph version ceph version 12.2.11 (26dc3775efc7bb286a1d6d66faee0ba30ea23eee) luminous (stable) Also I have followed Eugen's advice and set the appropriate cluster flag: ceph osd require-osd-release luminous Now my cluster shows: [root@lvtncephx110 ~]# ceph osd dump | grep recovery flags sortbitwise,recovery_deletes,purged_snapdirs I like Paul's note to perform a full deep scrub which will take some time to complete but will ensure that all data is touched and pruned as necessary - good fsck on each OSD: ceph osd deep-scrub all [root@lvtncephx110 ~]# ceph status cluster: id: 5fabf1b2-cfd0-44a8-a6b5-fb3fd0545517 health: HEALTH_OK services: mon: 3 daemons, quorum lvtncephx121,lvtncephx122,lvtncephx123 mgr: lvtncephx121(active), standbys: lvtncephx122, lvtncephx123 mds: cephfs-1/1/1 up {0=lvtncephx152=up:active}, 1 up:standby osd: 18 osds: 18 up, 18 in rgw: 2 daemons active data: pools: 23 pools, 2016 pgs objects: 2.67M objects, 10.1TiB usage: 20.2TiB used, 38.6TiB / 58.8TiB avail pgs: 2011 active+clean 5 active+clean+scrubbing+deep This means that I *could* upgrade to mimic now (at least as soon as the deep scrub completes). However, other posts show that there could be a problem with pglog_hardlimit and I should wait until 13.2.5 Thanks for the suggestions and I feel confident in our ability to upgrade to Mimic within the next couple months (time to let 13.2.5 settle). Andy > On Feb 7, 2019, at 1:21 PM, Paul Emmerich <paul.emmerich@xxxxxxxx> wrote: > > You need to run a full deep scrub before continuing the upgrade, the > reason for this is that the deep scrub migrates the format of some > snapshot-related on-disk data structure. > > Looks like you only tried a normal scrub, not a deep-scrub > > Paul > > -- > Paul Emmerich > > Looking for help with your Ceph cluster? Contact us at https://croit.io > > croit GmbH > Freseniusstr. 31h > 81247 München > www.croit.io > Tel: +49 89 1896585 90 > > On Thu, Feb 7, 2019 at 4:34 PM Eugen Block <eblock@xxxxxx> wrote: >> >> Hi, >> >> could it be a missing 'ceph osd require-osd-release luminous' on your cluster? >> >> When I check a luminous cluster I get this: >> >> host1:~ # ceph osd dump | grep recovery >> flags sortbitwise,recovery_deletes,purged_snapdirs >> >> The flags in the code you quote seem related to that. >> Can you check that output on your cluster? >> >> Found this in a thread from last year [1]. >> >> >> Regards, >> Eugen >> >> [1] https://www.spinics.net/lists/ceph-devel/msg40191.html >> >> Zitat von Andrew Bruce <dbmail1771@xxxxxxxxx>: >> >>> Hello All! Yesterday started upgrade from luminous to mimic with one >>> of my 3 MONs. >>> >>> After applying mimic yum repo and updating - a restart reports the >>> following error from the MON log file: >>> >>> ==> /var/log/ceph/ceph-mon.lvtncephx121.log <== >>> 2019-02-07 10:02:40.110 7fc8283ed700 -1 mon.lvtncephx121@0(probing) >>> e4 handle_probe_reply existing cluster has not completed a full >>> luminous scrub to purge legacy snapdir objects; please scrub before >>> upgrading beyond luminous. >>> >>> My question is simply: What exactly does this require? >>> >>> Yesterday afternoon I did a manual: >>> >>> ceph osd scrub all >>> >>> But that has zero effect. I still get the same message on restarting the MON >>> >>> I have no errors in the cluster except for the single MON >>> (lvtncephx121) that I'm working to migrate to mimic first: >>> >>> [root@lvtncephx110 ~]# ceph status >>> cluster: >>> id: 5fabf1b2-cfd0-44a8-a6b5-fb3fd0545517 >>> health: HEALTH_WARN >>> 1/3 mons down, quorum lvtncephx122,lvtncephx123 >>> >>> services: >>> mon: 3 daemons, quorum lvtncephx122,lvtncephx123, out of quorum: >>> lvtncephx121 >>> mgr: lvtncephx122(active), standbys: lvtncephx123, lvtncephx121 >>> mds: cephfs-1/1/1 up {0=lvtncephx151=up:active}, 1 up:standby >>> osd: 18 osds: 18 up, 18 in >>> rgw: 2 daemons active >>> >>> data: >>> pools: 23 pools, 2016 pgs >>> objects: 2608k objects, 10336 GB >>> usage: 20689 GB used, 39558 GB / 60247 GB avail >>> pgs: 2016 active+clean >>> >>> io: >>> client: 5612 B/s rd, 3756 kB/s wr, 1350 op/s rd, 412 op/s wr >>> >>> FWIW: The source code has the following: >>> >>> // Monitor.cc >>> if (!osdmon()->osdmap.test_flag(CEPH_OSDMAP_PURGED_SNAPDIRS) || >>> !osdmon()->osdmap.test_flag(CEPH_OSDMAP_RECOVERY_DELETES)) { >>> derr << __func__ << " existing cluster has not completed a >>> full luminous" >>> << " scrub to purge legacy snapdir objects; please scrub before" >>> << " upgrading beyond luminous." << dendl; >>> exit(0); >>> } >>> } >>> >>> So two question: >>> How to show the current flags in the OSD map checked by the monitor? >>> How to get these flags set so the MON will actually start. >>> >>> Thanks, >>> Andy >> >> >> >> _______________________________________________ >> ceph-users mailing list >> ceph-users@xxxxxxxxxxxxxx >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com