Re: v0.94.7 Hammer released

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi again,

We just finished the upgrade (5 mons, 1200 OSDs). As I mentioned, we
had loads of monitor elections and slow requests during the upgrades.
perf top showed the leader spending lots of time in LogMonitor::preprocess_log:

  43.79%  ceph-mon              [.] LogMonitor::preprocess_log

To mitigate I tried a few things to minimize osd map changes: set
noout, update crush on start = false. I also increased the mon lease
timeouts:

  ceph tell mon.* injectargs -- --mon_lease=15
--mon_lease_renew_interval=9 --mon_lease_ack_timeout=30

None of that really helped. But finally I did:

  ceph tell osd.* injectargs -- --clog_to_monitors=false

which made things much better.

When I upgrade our 2nd cluster tomorrow, I'll set
clog_to_monitors=false before starting.

Cheers, Dan


On Tue, May 24, 2016 at 10:02 AM, Dan van der Ster <dan@xxxxxxxxxxxxxx> wrote:
> Hi all,
>
> I'm mid-upgrade on a large cluster now. The upgrade is not going smoothly --
> it looks like the ceph-mon's are getting bombarded by so many of these crc
> error warnings that they go into elections.
>
> Did anyone upgrade a large cluster from 0.94.6 to 0.94.7 ? If not I'd advise
> waiting until this is better understood.
>
> Cheers, Dan
>
>
> On Tue, May 17, 2016 at 2:14 PM, Christian Balzer <chibi@xxxxxxx> wrote:
>>
>>
>> Hello,
>>
>> for the record, I did the exact same sequence (no MDS) on my test cluster
>> with exactly the same results.
>>
>> Didn't report it as I assumed it to be a more noisy (but harmless)
>> upgrade artifact.
>>
>> Christian
>>
>> On Tue, 17 May 2016 14:07:21 +0200 Dan van der Ster wrote:
>>
>> > On Tue, May 17, 2016 at 1:56 PM, Sage Weil <sweil@xxxxxxxxxx> wrote:
>> > > On Tue, 17 May 2016, Dan van der Ster wrote:
>> > >> Hi Sage et al,
>> > >>
>> > >> I'm updating our pre-prod cluster from 0.94.6 to 0.94.7 and after
>> > >> upgrading the ceph-mon's I'm getting loads of warnings like:
>> > >>
>> > >> 2016-05-17 10:01:29.314785 osd.76 [WRN] failed to encode map e103116
>> > >> with expected crc
>> > >>
>> > >> I've seen that error is whitelisted in the qa-suite:
>> > >> https://github.com/ceph/ceph-qa-suite/pull/602/files
>> > >>
>> > >> Is it really harmless? (This is the first time I've seen such a
>> > >> warning).
>> > >
>> > > Are you sure you were upgrading from v0.94.6?
>> >
>> > Absolutely. I first updated the mons, which I restarted into quorum
>> > with 0.96.7. Then any changes to the osdmap triggered the failed to
>> > encode warning.
>> > The upgrade sequence went like this:
>> >
>> > Update mons 0.94.6 to 0.94.7, restart, quorum. No warnings.
>> > Update mds's 0.94.6 to 0.94.7, restart. Warnings from ~all osds.
>> > Update osds 0.94.6 to 0.94.7, restart host by host. The 0.94.6 osds
>> > printed warnings, the new OSDs did not.
>> >
>> > > I don't see anything that
>> > > would trigger these warnings going from .6 to .7, which is strange.
>> >
>> > Could the osdmap GMT hitset changes have caused it? Commits Mar 24 here:
>> >
>> >    https://github.com/ceph/ceph/compare/v0.94.6...v0.94.7?expand=1
>> >
>> > > That said, the errors are generally harmless--it just means the
>> > > monitors are running a different version of the code and the OSDs are
>> > > pulling maps directly from a mon to ensure they are all in sync.  It's
>> > > normal during many upgrades, but not expected for this particular
>> > > jump...
>> >
>> > Then I'm curious if others are getting this from 0.94.6 to 0.94.7.
>> > For now I'm waiting to update our prod cluster.
>> >
>> > Thanks!
>> >
>> > Dan
>> >
>> >
>> > >
>> > > sage
>> > >
>> > >
>> > >
>> > >
>> > >> Thanks in advance!
>> > >>
>> > >> Dan
>> > >>
>> > >>
>> > >>
>> > >>
>> > >> On Fri, May 13, 2016 at 4:21 PM, Sage Weil <sage@xxxxxxxxxx> wrote:
>> > >> > This Hammer point release fixes several minor bugs. It also
>> > >> > includes a backport of an improved ‘ceph osd
>> > >> > reweight-by-utilization’ command for handling OSDs with
>> > >> > higher-than-average utilizations.
>> > >> >
>> > >> > We recommend that all hammer v0.94.x users upgrade.
>> > >> >
>> > >> > For more detailed information, see the release announcement at
>> > >> >
>> > >> >         http://ceph.com/releases/v0-94-7-hammer-released/
>> > >> >
>> > >> > or the complete changelog at
>> > >> >
>> > >> >         http://docs.ceph.com/docs/master/_downloads/v0.94.6.txt
>> > >> >
>> > >> > Getting Ceph
>> > >> > ------------
>> > >> >
>> > >> > * Git at git://github.com/ceph/ceph.git
>> > >> > * Tarball at http://download.ceph.com/tarballs/ceph-0.94.7.tar.gz
>> > >> > * For packages, see
>> > >> > http://ceph.com/docs/master/install/get-packages
>> > >> > * For ceph-deploy, see
>> > >> > http://ceph.com/docs/master/install/install-ceph-deploy
>> > >> > _______________________________________________ ceph-users mailing
>> > >> > list ceph-users@xxxxxxxxxxxxxx
>> > >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> > >> >
>> > >>
>> > >>
>> > _______________________________________________
>> > ceph-users mailing list
>> > ceph-users@xxxxxxxxxxxxxx
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>> --
>> Christian Balzer        Network/Systems Engineer
>> chibi@xxxxxxx           Global OnLine Japan/Rakuten Communications
>> http://www.gol.com/
>
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux