Thanks Kefu! Downgrading the mons to 0.94.6 got us out of this situation. I appreciate you tracking this down! Bryan On 10/4/16, 1:18 AM, "ceph-users on behalf of kefu chai" <ceph-users-bounces@xxxxxxxxxxxxxx on behalf of tchaikov@xxxxxxxxx> wrote: >hi ceph users, > >If user upgrades the cluster from a prior release to v0.94.7 or up by >following the steps: > >1. upgrade the monitors first, >2. and then the OSDs. > >It is expected that the cluster log will be flooded with messages like: > >2016-07-12 08:42:42.1234567 osd.1234 [WRN] failed to encode map e4321 >with expected crc > >Because we changed[1] the encoding of OSDMap in v0.94.7. And the >monitors start sending the incremental OSDMaps with the new encoding >to the OSDs once the quorum members are all at the new version. But >the OSDs at the old version still re-encode the osdmaps with the old >encoding, then compare the resulting CRC with the one carried by the >received incremental maps. And, they don't match! So the OSDs will ask >the monitors for the full map in this case. > >For a large Ceph cluster, there are several consequences of the CRC >mismatch: >1. monitor being flooded by this clog >2. monitor burdened by the sending the fullmaps. >3. the network saturated by the osdmap messages carrying the requested >fullmaps >3. slow requests observed if the updated osdmaps are delayed by the >saturated network. > >as reported[2,3,4,5] by our users. > >The interim solution for those who are stuck in the middle of an upgrade >is: > >1. revert all the monitors back to the previous version, >2. upgrade the OSDs to the version you want to upgrade. >3. upgrade the monitors to the version you want to upgrade. > >And for users who plan to upgrade from a version prior to v0.94.7 to >v0.94.7 or up, please >1. upgrade the OSDs to the version you want to upgrade >2. upgrade the monitors to the version you want to upgrade. > >For users preferring upgrading from a version prior to v0.94.7 to >jewel, it is suggested to upgrade to the latest hammer first by >following the steps above, if the scale of your cluster is relatively >large. > >And in the short term, we are preparing a fix[6] for hammer, so the >monitors will send osdmap encoded with lower version encoding. > >In the long term, we won't use the new release feature bit in the >cluster unless allowed explicitly[7]. > > >@ceph developers, > >so if we want to bump up the encoding version of OSDMap or its >(sub)fields, I think it would be desirable to match the encoder with >the new major release feature bit. For instance, if a new field named >"foo" is added to `pg_pool_t` in kraken, and `map<int64_t,pg_pool_t> >pools` is in turn a field of `OSDMap`, then we need to be careful when >updating `pg_pool_t::encode()`, like > >void pg_pool_t::encode(bufferlist& bl, uint64_t features) const { > // ... > if ((features & CEPH_FEATURE_SERVER_KRAKEN) == 0) { > // encode in the jewel way > return; > } > // encode in the kraken way >} > >Because, > >- it would be difficult for the monitor to send understandable osdmaps >for all osds. >- we disable/enable the new encoder by excluding/including the major >release feature bit in [7]. > >-- >[1] sha1 039240418060c9a49298dacc0478772334526dce >[2] https://www.mail-archive.com/ceph-users@xxxxxxxxxxxxxx/msg30783.html >[3] http://www.spinics.net/lists/ceph-users/msg28296.html >[4] http://ceph-users.ceph.narkive.com/rPGrATpE/v0-94-7-hammer-released >[5] >http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-September/013189. >html >[6] http://tracker.ceph.com/issues/17386 >[7] https://github.com/ceph/ceph/pull/11284 > >-- >Regards >Kefu Chai >_______________________________________________ >ceph-users mailing list >ceph-users@xxxxxxxxxxxxxx >http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com