Hi again, We just finished the upgrade (5 mons, 1200 OSDs). As I mentioned, we had loads of monitor elections and slow requests during the upgrades. perf top showed the leader spending lots of time in LogMonitor::preprocess_log: 43.79% ceph-mon [.] LogMonitor::preprocess_log To mitigate I tried a few things to minimize osd map changes: set noout, update crush on start = false. I also increased the mon lease timeouts: ceph tell mon.* injectargs -- --mon_lease=15 --mon_lease_renew_interval=9 --mon_lease_ack_timeout=30 None of that really helped. But finally I did: ceph tell osd.* injectargs -- --clog_to_monitors=false which made things much better. When I upgrade our 2nd cluster tomorrow, I'll set clog_to_monitors=false before starting. Cheers, Dan On Tue, May 24, 2016 at 10:02 AM, Dan van der Ster <dan@xxxxxxxxxxxxxx> wrote: > Hi all, > > I'm mid-upgrade on a large cluster now. The upgrade is not going smoothly -- > it looks like the ceph-mon's are getting bombarded by so many of these crc > error warnings that they go into elections. > > Did anyone upgrade a large cluster from 0.94.6 to 0.94.7 ? If not I'd advise > waiting until this is better understood. > > Cheers, Dan > > > On Tue, May 17, 2016 at 2:14 PM, Christian Balzer <chibi@xxxxxxx> wrote: >> >> >> Hello, >> >> for the record, I did the exact same sequence (no MDS) on my test cluster >> with exactly the same results. >> >> Didn't report it as I assumed it to be a more noisy (but harmless) >> upgrade artifact. >> >> Christian >> >> On Tue, 17 May 2016 14:07:21 +0200 Dan van der Ster wrote: >> >> > On Tue, May 17, 2016 at 1:56 PM, Sage Weil <sweil@xxxxxxxxxx> wrote: >> > > On Tue, 17 May 2016, Dan van der Ster wrote: >> > >> Hi Sage et al, >> > >> >> > >> I'm updating our pre-prod cluster from 0.94.6 to 0.94.7 and after >> > >> upgrading the ceph-mon's I'm getting loads of warnings like: >> > >> >> > >> 2016-05-17 10:01:29.314785 osd.76 [WRN] failed to encode map e103116 >> > >> with expected crc >> > >> >> > >> I've seen that error is whitelisted in the qa-suite: >> > >> https://github.com/ceph/ceph-qa-suite/pull/602/files >> > >> >> > >> Is it really harmless? (This is the first time I've seen such a >> > >> warning). >> > > >> > > Are you sure you were upgrading from v0.94.6? >> > >> > Absolutely. I first updated the mons, which I restarted into quorum >> > with 0.96.7. Then any changes to the osdmap triggered the failed to >> > encode warning. >> > The upgrade sequence went like this: >> > >> > Update mons 0.94.6 to 0.94.7, restart, quorum. No warnings. >> > Update mds's 0.94.6 to 0.94.7, restart. Warnings from ~all osds. >> > Update osds 0.94.6 to 0.94.7, restart host by host. The 0.94.6 osds >> > printed warnings, the new OSDs did not. >> > >> > > I don't see anything that >> > > would trigger these warnings going from .6 to .7, which is strange. >> > >> > Could the osdmap GMT hitset changes have caused it? Commits Mar 24 here: >> > >> > https://github.com/ceph/ceph/compare/v0.94.6...v0.94.7?expand=1 >> > >> > > That said, the errors are generally harmless--it just means the >> > > monitors are running a different version of the code and the OSDs are >> > > pulling maps directly from a mon to ensure they are all in sync. It's >> > > normal during many upgrades, but not expected for this particular >> > > jump... >> > >> > Then I'm curious if others are getting this from 0.94.6 to 0.94.7. >> > For now I'm waiting to update our prod cluster. >> > >> > Thanks! >> > >> > Dan >> > >> > >> > > >> > > sage >> > > >> > > >> > > >> > > >> > >> Thanks in advance! >> > >> >> > >> Dan >> > >> >> > >> >> > >> >> > >> >> > >> On Fri, May 13, 2016 at 4:21 PM, Sage Weil <sage@xxxxxxxxxx> wrote: >> > >> > This Hammer point release fixes several minor bugs. It also >> > >> > includes a backport of an improved ‘ceph osd >> > >> > reweight-by-utilization’ command for handling OSDs with >> > >> > higher-than-average utilizations. >> > >> > >> > >> > We recommend that all hammer v0.94.x users upgrade. >> > >> > >> > >> > For more detailed information, see the release announcement at >> > >> > >> > >> > http://ceph.com/releases/v0-94-7-hammer-released/ >> > >> > >> > >> > or the complete changelog at >> > >> > >> > >> > http://docs.ceph.com/docs/master/_downloads/v0.94.6.txt >> > >> > >> > >> > Getting Ceph >> > >> > ------------ >> > >> > >> > >> > * Git at git://github.com/ceph/ceph.git >> > >> > * Tarball at http://download.ceph.com/tarballs/ceph-0.94.7.tar.gz >> > >> > * For packages, see >> > >> > http://ceph.com/docs/master/install/get-packages >> > >> > * For ceph-deploy, see >> > >> > http://ceph.com/docs/master/install/install-ceph-deploy >> > >> > _______________________________________________ ceph-users mailing >> > >> > list ceph-users@xxxxxxxxxxxxxx >> > >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > >> > >> > >> >> > >> >> > _______________________________________________ >> > ceph-users mailing list >> > ceph-users@xxxxxxxxxxxxxx >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> >> -- >> Christian Balzer Network/Systems Engineer >> chibi@xxxxxxx Global OnLine Japan/Rakuten Communications >> http://www.gol.com/ > > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com