Re: Upgrade from Giant to Hammer and after some basic operations most of the OSD's went down

Sage Weil <sage@xxxxxxxxxxxx> · Fri, 1 May 2015 09:04:12 -0700 (PDT)

On Fri, 1 May 2015, tuomas.juntunen@xxxxxxxxxxxxxxx wrote:
> Hi
> 
> I deleted the images and img pools and started osd's, they still die.
> 
> Here's a log of one of the osd's after this, if you need it.
> 
> http://beta.xaasbox.com/ceph/ceph-osd.19.log

I've pushed another commit that should avoid this case, sha1
425bd4e1dba00cc2243b0c27232d1f9740b04e34.

Note that once the pools are fully deleted (shouldn't take too long once 
the osds are up and stabilize) you should switch back to the normal 
packages that don't have these workarounds.

sage

> 
> Br,
> Tuomas
> 
> 
> > Thanks man. I'll try it tomorrow. Have a good one.
> >
> > Br,T
> >
> > -------- Original message --------
> > From: Sage Weil <sage@xxxxxxxxxxxx>
> > Date: 30/04/2015  18:23  (GMT+02:00)
> > To: Tuomas Juntunen <tuomas.juntunen@xxxxxxxxxxxxxxx>
> > Cc: ceph-users@xxxxxxxxxxxxxx, ceph-devel@xxxxxxxxxxxxxxx
> > Subject: RE:  Upgrade from Giant to Hammer and after some basic
> 
> > operations most of the OSD's went down
> >
> > On Thu, 30 Apr 2015, tuomas.juntunen@xxxxxxxxxxxxxxx wrote:
> >> Hey
> >>
> >> Yes I can drop the images data, you think this will fix it?
> >
> > It's a slightly different assert that (I believe) should not trigger once
> > the pool is deleted.Â  Please give that a try and if you still hit it I'll
> > whip up a workaround.
> >
> > Thanks!
> > sage
> >
> >  >
> >>
> >> Br,
> >>
> >> Tuomas
> >>
> >> > On Wed, 29 Apr 2015, Tuomas Juntunen wrote:
> >> >> Hi
> >> >>
> >> >> I updated that version and it seems that something did happen, the osd's
> >> >> stayed up for a while and 'ceph status' got updated. But then in couple of
> >> >> minutes, they all went down the same way.
> >> >>
> >> >> I have attached new 'ceph osd dump -f json-pretty' and got a new log from
> >> >> one of the osd's with osd debug = 20,
> >> >> http://beta.xaasbox.com/ceph/ceph-osd.15.log
> >> >
> >> > Sam mentioned that you had said earlier that this was not critical data?
> >> > If not, I think the simplest thing is to just drop those pools.Â  The
> >> > important thing (from my perspective at least :) is that we understand the
> >> > root cause and can prevent this in the future.
> >> >
> >> > sage
> >> >
> >> >
> >> >>
> >> >> Thank you!
> >> >>
> >> >> Br,
> >> >> Tuomas
> >> >>
> >> >>
> >> >>
> >> >> -----Original Message-----
> >> >> From: Sage Weil [mailto:sage@xxxxxxxxxxxx]
> >> >> Sent: 28. huhtikuuta 2015 23:57
> >> >> To: Tuomas Juntunen
> >> >> Cc: ceph-users@xxxxxxxxxxxxxx; ceph-devel@xxxxxxxxxxxxxxx
> >> >> Subject: Re:  Upgrade from Giant to Hammer and after some basic
> >> >> operations most of the OSD's went down
> >> >>
> >> >> Hi Tuomas,
> >> >>
> >> >> I've pushed an updated wip-hammer-snaps branch.Â  Can you please try it?
> >> >> The build will appear here
> >> >>
> >> >>
> >> >> http://gitbuilder.ceph.com/ceph-deb-trusty-x86_64-basic/sha1/08bf531331afd5e
> >> >> 2eb514067f72afda11bcde286
> >> >>
> >> >> (or a similar url; adjust for your distro).
> >> >>
> >> >> Thanks!
> >> >> sage
> >> >>
> >> >>
> >> >> On Tue, 28 Apr 2015, Sage Weil wrote:
> >> >>
> >> >> > [adding ceph-devel]
> >> >> >
> >> >> > Okay, I see the problem.Â  This seems to be unrelated ot the giant ->
> >> >> > hammer move... it's a result of the tiering changes you made:
> >> >> >
> >> >> > > > > > > > The following:
> >> >> > > > > > > >
> >> >> > > > > > > > ceph osd tier add img images --force-nonempty ceph osd
> >> >> > > > > > > > tier cache-mode images forward ceph osd tier set-overlay
> >> >> > > > > > > > img images
> >> >> >
> >> >> > Specifically, --force-nonempty bypassed important safety checks.
> >> >> >
> >> >> > 1. images had snapshots (and removed_snaps)
> >> >> >
> >> >> > 2. images was added as a tier *of* img, and img's removed_snaps was
> >> >> > copied to images, clobbering the removed_snaps value (see
> >> >> > OSDMap::Incremental::propagate_snaps_to_tiers)
> >> >> >
> >> >> > 3. tiering relation was undone, but removed_snaps was still gone
> >> >> >
> >> >> > 4. on OSD startup, when we load the PG, removed_snaps is initialized
> >> >> > with the older map.Â  later, in PGPool::update(), we assume that
> >> >> > removed_snaps alwasy grows (never shrinks) and we trigger an assert.
> >> >> >
> >> >> > To fix this I think we need to do 2 things:
> >> >> >
> >> >> > 1. make the OSD forgiving out removed_snaps getting smaller.Â  This is
> >> >> > probably a good thing anyway: once we know snaps are removed on all
> >> >> > OSDs we can prune the interval_set in the OSDMap.Â  Maybe.
> >> >> >
> >> >> > 2. Fix the mon to prevent this from happening, *even* when
> >> >> > --force-nonempty is specified.Â  (This is the root cause.)
> >> >> >
> >> >> > I've opened http://tracker.ceph.com/issues/11493 to track this.
> >> >> >
> >> >> > sage
> >> >> >
> >> >> >
> >> >> >
> >> >> > > > > > > >
> >> >> > > > > > > > Idea was to make images as a tier to img, move data to img
> >> >> > > > > > > > then change
> >> >> > > > > > > clients to use the new img pool.
> >> >> > > > > > > >
> >> >> > > > > > > > Br,
> >> >> > > > > > > > Tuomas
> >> >> > > > > > > >
> >> >> > > > > > > > > Can you explain exactly what you mean by:
> >> >> > > > > > > > >
> >> >> > > > > > > > > "Also I created one pool for tier to be able to move
> >> >> > > > > > > > > data without
> >> >> > > > > > > outage."
> >> >> > > > > > > > >
> >> >> > > > > > > > > -Sam
> >> >> > > > > > > > > ----- Original Message -----
> >> >> > > > > > > > > From: "tuomas juntunen"
> >> >> > > > > > > > > <tuomas.juntunen@xxxxxxxxxxxxxxx>
> >> >> > > > > > > > > To: "Ian Colle" <icolle@xxxxxxxxxx>
> >> >> > > > > > > > > Cc: ceph-users@xxxxxxxxxxxxxx
> >> >> > > > > > > > > Sent: Monday, April 27, 2015 4:23:44 AM
> >> >> > > > > > > > > Subject: Re:  Upgrade from Giant to Hammer
> >> >> > > > > > > > > and after some basic operations most of the OSD's went
> >> >> > > > > > > > > down
> >> >> > > > > > > > >
> >> >> > > > > > > > > Hi
> >> >> > > > > > > > >
> >> >> > > > > > > > > Any solution for this yet?
> >> >> > > > > > > > >
> >> >> > > > > > > > > Br,
> >> >> > > > > > > > > Tuomas
> >> >> > > > > > > > >
> >> >> > > > > > > > >> It looks like you may have hit
> >> >> > > > > > > > >> http://tracker.ceph.com/issues/7915
> >> >> > > > > > > > >>
> >> >> > > > > > > > >> Ian R. Colle
> >> >> > > > > > > > >> Global Director
> >> >> > > > > > > > >> of Software Engineering Red Hat (Inktank is now part of
> >> >> > > > > > > > >> Red Hat!) http://www.linkedin.com/in/ircolle
> >> >> > > > > > > > >> http://www.twitter.com/ircolle
> >> >> > > > > > > > >> Cell: +1.303.601.7713
> >> >> > > > > > > > >> Email: icolle@xxxxxxxxxx
> >> >> > > > > > > > >>
> >> >> > > > > > > > >> ----- Original Message -----
> >> >> > > > > > > > >> From: "tuomas juntunen"
> >> >> > > > > > > > >> <tuomas.juntunen@xxxxxxxxxxxxxxx>
> >> >> > > > > > > > >> To: ceph-users@xxxxxxxxxxxxxx
> >> >> > > > > > > > >> Sent: Monday, April 27, 2015 1:56:29 PM
> >> >> > > > > > > > >> Subject:  Upgrade from Giant to Hammer and
> >> >> > > > > > > > >> after some basic operations most of the OSD's went down
> >> >> > > > > > > > >>
> >> >> > > > > > > > >>
> >> >> > > > > > > > >>
> >> >> > > > > > > > >> I upgraded Ceph from 0.87 Giant to 0.94.1 Hammer
> >> >> > > > > > > > >>
> >> >> > > > > > > > >> Then created new pools and deleted some old ones. Also
> >> >> > > > > > > > >> I created one pool for tier to be able to move data
> >> >> > > > > > > > >> without
> >> >> > > outage.
> >> >> > > > > > > > >>
> >> >> > > > > > > > >> After these operations all but 10 OSD's are down and
> >> >> > > > > > > > >> creating this kind of messages to logs, I get more than
> >> >> > > > > > > > >> 100gb of these in a
> >> >> > > > > > night:
> >> >> > > > > > > > >>
> >> >> > > > > > > > >>Â  -19> 2015-04-27 10:17:08.808584 7fd8e748d700Â  5 osd.23
> >> >> > > pg_epoch:
> >> >> > > >
> >> >> > > > > > > > >> 17882 pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609
> >> >> > > > > > > > >> n=0
> >> >> > > > > > > > >> ec=1 les/c
> >> >> > > > > > > > >> 16609/16659
> >> >> > > > > > > > >> 16590/16590/16590) [24,3,23] r=2 lpr=17838
> >> >> > > > > > > > >> pi=15659-16589/42
> >> >> > > > > > > > >> crt=8480'7 lcod
> >> >> > > > > > > > >> 0'0 inactive NOTIFY] enter Started
> >> >> > > > > > > > >>Â Â Â  -18> 2015-04-27 10:17:08.808596 7fd8e748d700Â  5
> >> >> > > > > > > > >> osd.23
> >> >> > > > pg_epoch:
> >> >> > > > >
> >> >> > > > > > > > >> 17882 pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609
> >> >> > > > > > > > >> n=0
> >> >> > > > > > > > >> ec=1 les/c
> >> >> > > > > > > > >> 16609/16659
> >> >> > > > > > > > >> 16590/16590/16590) [24,3,23] r=2 lpr=17838
> >> >> > > > > > > > >> pi=15659-16589/42
> >> >> > > > > > > > >> crt=8480'7 lcod
> >> >> > > > > > > > >> 0'0 inactive NOTIFY] enter Start
> >> >> > > > > > > > >>Â Â Â  -17> 2015-04-27 10:17:08.808608 7fd8e748d700Â  1
> >> >> > > > > > > > >> osd.23
> >> >> > > > pg_epoch:
> >> >> > > > >
> >> >> > > > > > > > >> 17882 pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609
> >> >> > > > > > > > >> n=0
> >> >> > > > > > > > >> ec=1 les/c
> >> >> > > > > > > > >> 16609/16659
> >> >> > > > > > > > >> 16590/16590/16590) [24,3,23] r=2 lpr=17838
> >> >> > > > > > > > >> pi=15659-16589/42
> >> >> > > > > > > > >> crt=8480'7 lcod
> >> >> > > > > > > > >> 0'0 inactive NOTIFY] state<Start>: transitioning to Stray
> >> >> > > > > > > > >>Â Â Â  -16> 2015-04-27 10:17:08.808621 7fd8e748d700Â  5
> >> >> > > > > > > > >> osd.23
> >> >> > > > pg_epoch:
> >> >> > > > >
> >> >> > > > > > > > >> 17882 pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609
> >> >> > > > > > > > >> n=0
> >> >> > > > > > > > >> ec=1 les/c
> >> >> > > > > > > > >> 16609/16659
> >> >> > > > > > > > >> 16590/16590/16590) [24,3,23] r=2 lpr=17838
> >> >> > > > > > > > >> pi=15659-16589/42
> >> >> > > > > > > > >> crt=8480'7 lcod
> >> >> > > > > > > > >> 0'0 inactive NOTIFY] exit Start 0.000025 0 0.000000
> >> >> > > > > > > > >>Â Â Â  -15> 2015-04-27 10:17:08.808637 7fd8e748d700Â  5
> >> >> > > > > > > > >> osd.23
> >> >> > > > pg_epoch:
> >> >> > > > >
> >> >> > > > > > > > >> 17882 pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609
> >> >> > > > > > > > >> n=0
> >> >> > > > > > > > >> ec=1 les/c
> >> >> > > > > > > > >> 16609/16659
> >> >> > > > > > > > >> 16590/16590/16590) [24,3,23] r=2 lpr=17838
> >> >> > > > > > > > >> pi=15659-16589/42
> >> >> > > > > > > > >> crt=8480'7 lcod
> >> >> > > > > > > > >> 0'0 inactive NOTIFY] enter Started/Stray
> >> >> > > > > > > > >>Â Â Â  -14> 2015-04-27 10:17:08.808796 7fd8e748d700Â  5
> >> >> > > > > > > > >> osd.23
> >> >> > > > pg_epoch:
> >> >> > > > >
> >> >> > > > > > > > >> 17882 pg[10.181( empty local-les=17879 n=0 ec=17863
> >> >> > > > > > > > >> les/c
> >> >> > > > > > > > >> 17879/17879
> >> >> > > > > > > > >> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0
> >> >> > > > > > > > >> inactive NOTIFY] exit Reset 0.119467 4 0.000037
> >> >> > > > > > > > >>Â Â Â  -13> 2015-04-27 10:17:08.808817 7fd8e748d700Â  5
> >> >> > > > > > > > >> osd.23
> >> >> > > > pg_epoch:
> >> >> > > > >
> >> >> > > > > > > > >> 17882 pg[10.181( empty local-les=17879 n=0 ec=17863
> >> >> > > > > > > > >> les/c
> >> >> > > > > > > > >> 17879/17879
> >> >> > > > > > > > >> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0
> >> >> > > > > > > > >> inactive NOTIFY] enter Started
> >> >> > > > > > > > >>Â Â Â  -12> 2015-04-27 10:17:08.808828 7fd8e748d700Â  5
> >> >> > > > > > > > >> osd.23
> >> >> > > > pg_epoch:
> >> >> > > > >
> >> >> > > > > > > > >> 17882 pg[10.181( empty local-les=17879 n=0 ec=17863
> >> >> > > > > > > > >> les/c
> >> >> > > > > > > > >> 17879/17879
> >> >> > > > > > > > >> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0
> >> >> > > > > > > > >> inactive NOTIFY] enter Start
> >> >> > > > > > > > >>Â Â Â  -11> 2015-04-27 10:17:08.808838 7fd8e748d700Â  1
> >> >> > > > > > > > >> osd.23
> >> >> > > > pg_epoch:
> >> >> > > > >
> >> >> > > > > > > > >> 17882 pg[10.181( empty local-les=17879 n=0 ec=17863
> >> >> > > > > > > > >> les/c
> >> >> > > > > > > > >> 17879/17879
> >> >> > > > > > > > >> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0
> >> >> > > > > > > > >> inactive NOTIFY]
> >> >> > > > > > > > >> state<Start>: transitioning to Stray
> >> >> > > > > > > > >>Â Â Â  -10> 2015-04-27 10:17:08.808849 7fd8e748d700Â  5
> >> >> > > > > > > > >> osd.23
> >> >> > > > pg_epoch:
> >> >> > > > >
> >> >> > > > > > > > >> 17882 pg[10.181( empty local-les=17879 n=0 ec=17863
> >> >> > > > > > > > >> les/c
> >> >> > > > > > > > >> 17879/17879
> >> >> > > > > > > > >> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0
> >> >> > > > > > > > >> inactive NOTIFY] exit Start 0.000020 0 0.000000
> >> >> > > > > > > > >>Â Â Â Â  -9> 2015-04-27 10:17:08.808861 7fd8e748d700Â  5
> >> >> > > > > > > > >> osd.23
> >> >> > > > pg_epoch:
> >> >> > > > >
> >> >> > > > > > > > >> 17882 pg[10.181( empty local-les=17879 n=0 ec=17863
> >> >> > > > > > > > >> les/c
> >> >> > > > > > > > >> 17879/17879
> >> >> > > > > > > > >> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0
> >> >> > > > > > > > >> inactive NOTIFY] enter Started/Stray
> >> >> > > > > > > > >>Â Â Â Â  -8> 2015-04-27 10:17:08.809427 7fd8e748d700Â  5
> >> >> > > > > > > > >> osd.23
> >> >> > > > pg_epoch:
> >> >> > > > >
> >> >> > > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
> >> >> > > > > > > > >> 16127/16344
> >> >> > > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod
> >> >> > > > > > > > >> 0'0 inactive] exit Reset 7.511623 45 0.000165
> >> >> > > > > > > > >>Â Â Â Â  -7> 2015-04-27 10:17:08.809445 7fd8e748d700Â  5
> >> >> > > > > > > > >> osd.23
> >> >> > > > pg_epoch:
> >> >> > > > >
> >> >> > > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
> >> >> > > > > > > > >> 16127/16344
> >> >> > > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod
> >> >> > > > > > > > >> 0'0 inactive] enter Started
> >> >> > > > > > > > >>Â Â Â Â  -6> 2015-04-27 10:17:08.809456 7fd8e748d700Â  5
> >> >> > > > > > > > >> osd.23
> >> >> > > > pg_epoch:
> >> >> > > > >
> >> >> > > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
> >> >> > > > > > > > >> 16127/16344
> >> >> > > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod
> >> >> > > > > > > > >> 0'0 inactive] enter Start
> >> >> > > > > > > > >>Â Â Â Â  -5> 2015-04-27 10:17:08.809468 7fd8e748d700Â  1
> >> >> > > > > > > > >> osd.23
> >> >> > > > pg_epoch:
> >> >> > > > >
> >> >> > > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
> >> >> > > > > > > > >> 16127/16344
> >> >> > > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod
> >> >> > > > > > > > >> 0'0 inactive]
> >> >> > > > > > > > >> state<Start>: transitioning to Primary
> >> >> > > > > > > > >>Â Â Â Â  -4> 2015-04-27 10:17:08.809479 7fd8e748d700Â  5
> >> >> > > > > > > > >> osd.23
> >> >> > > > pg_epoch:
> >> >> > > > >
> >> >> > > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
> >> >> > > > > > > > >> 16127/16344
> >> >> > > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod
> >> >> > > > > > > > >> 0'0 inactive] exit Start 0.000023 0 0.000000
> >> >> > > > > > > > >>Â Â Â Â  -3> 2015-04-27 10:17:08.809492 7fd8e748d700Â  5
> >> >> > > > > > > > >> osd.23
> >> >> > > > pg_epoch:
> >> >> > > > >
> >> >> > > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
> >> >> > > > > > > > >> 16127/16344
> >> >> > > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod
> >> >> > > > > > > > >> 0'0 inactive] enter Started/Primary
> >> >> > > > > > > > >>Â Â Â Â  -2> 2015-04-27 10:17:08.809502 7fd8e748d700Â  5
> >> >> > > > > > > > >> osd.23
> >> >> > > > pg_epoch:
> >> >> > > > >
> >> >> > > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
> >> >> > > > > > > > >> 16127/16344
> >> >> > > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod
> >> >> > > > > > > > >> 0'0 inactive] enter Started/Primary/Peering
> >> >> > > > > > > > >>Â Â Â Â  -1> 2015-04-27 10:17:08.809513 7fd8e748d700Â  5
> >> >> > > > > > > > >> osd.23
> >> >> > > > pg_epoch:
> >> >> > > > >
> >> >> > > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c
> >> >> > > > > > > > >> 16127/16344
> >> >> > > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod
> >> >> > > > > > > > >> 0'0 peering] enter Started/Primary/Peering/GetInfo
> >> >> > > > > > > > >>Â Â Â Â Â  0> 2015-04-27 10:17:08.813837 7fd8e748d700 -1
> >> >> > > > > > > ./include/interval_set.h:
> >> >> > > > > > > > >> In
> >> >> > > > > > > > >> function 'void interval_set<T>::erase(T, T) [with T =
> >> >> > > snapid_t]'
> >> >> > > > > > > > >> thread
> >> >> > > > > > > > >> 7fd8e748d700 time 2015-04-27 10:17:08.809899
> >> >> > > > > > > > >> ./include/interval_set.h: 385: FAILED assert(_size >=
> >> >> > > > > > > > >> 0)
> >> >> > > > > > > > >>
> >> >> > > > > > > > >>Â  ceph version 0.94.1
> >> >> > > > > > > > >> (e4bfad3a3c51054df7e537a724c8d0bf9be972ff)
> >> >> > > > > > > > >>Â  1: (ceph::__ceph_assert_fail(char const*, char const*,
> >> >> > > > > > > > >> int, char
> >> >> > > > > > > > >> const*)+0x8b)
> >> >> > > > > > > > >> [0xbc271b]
> >> >> > > > > > > > >>Â  2:
> >> >> > > > > > > > >> (interval_set<snapid_t>::subtract(interval_set<snapid_t
> >> >> > > > > > > > >> >
> >> >> > > > > > > > >> const&)+0xb0) [0x82cd50]
> >> >> > > > > > > > >>Â  3: (PGPool::update(std::tr1::shared_ptr<OSDMap
> >> >> > > > > > > > >> const>)+0x52e) [0x80113e]
> >> >> > > > > > > > >>Â  4: (PG::handle_advance_map(std::tr1::shared_ptr<OSDMap
> >> >> > > > > > > > >> const>, std::tr1::shared_ptr<OSDMap const>,
> >> >> > > > > > > > >> const>std::vector<int,
> >> >> > > > > > > > >> std::allocator<int> >&, int, std::vector<int,
> >> >> > > > > > > > >> std::allocator<int>
> >> >> > > > > > > > >> >&, int, PG::RecoveryCtx*)+0x282) [0x801652]
> >> >> > > > > > > > >>Â  5: (OSD::advance_pg(unsigned int, PG*,
> >> >> > > > > > > > >> ThreadPool::TPHandle&, PG::RecoveryCtx*,
> >> >> > > > > > > > >> std::set<boost::intrusive_ptr<PG>,
> >> >> > > > > > > > >> std::less<boost::intrusive_ptr<PG> >,
> >> >> > > > > > > > >> std::allocator<boost::intrusive_ptr<PG> > >*)+0x2c3)
> >> >> > > > > > > > >> [0x6b0e43]
> >> >> > > > > > > > >>Â  6: (OSD::process_peering_events(std::list<PG*,
> >> >> > > > > > > > >> std::allocator<PG*>
> >> >> > > > > > > > >> > const&,
> >> >> > > > > > > > >> ThreadPool::TPHandle&)+0x21c) [0x6b191c]
> >> >> > > > > > > > >>Â  7: (OSD::PeeringWQ::_process(std::list<PG*,
> >> >> > > > > > > > >> std::allocator<PG*>
> >> >> > > > > > > > >> > const&,
> >> >> > > > > > > > >> ThreadPool::TPHandle&)+0x18) [0x709278]
> >> >> > > > > > > > >>Â  8: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa5e)
> >> >> > > > > > > > >> [0xbb38ae]
> >> >> > > > > > > > >>Â  9: (ThreadPool::WorkThread::entry()+0x10) [0xbb4950]
> >> >> > > > > > > > >>Â  10: (()+0x8182) [0x7fd906946182]
> >> >> > > > > > > > >>Â  11: (clone()+0x6d) [0x7fd904eb147d]
> >> >> > > > > > > > >>
> >> >> > > > > > > > >> Also by monitoring (ceph -w) I get the following
> >> >> > > > > > > > >> messages, also lots of
> >> >> > > > > > > them.
> >> >> > > > > > > > >>
> >> >> > > > > > > > >> 2015-04-27 10:39:52.935812 mon.0 [INF] from='client.?
> >> >> > > > > > > 10.20.0.13:0/1174409'
> >> >> > > > > > > > >> entity='osd.30' cmd=[{"prefix": "osd crush
> >> >> > > > > > > > >> create-or-move",
> >> >> > > > "args":
> >> >> > > > > > > > >> ["host=ceph3", "root=default"], "id": 30, "weight":
> >> 1.82}]:
> >> >>
> >> >> > > > > > > > >> dispatch
> >> >> > > > > > > > >> 2015-04-27 10:39:53.297376 mon.0 [INF] from='client.?
> >> >> > > > > > > 10.20.0.13:0/1174483'
> >> >> > > > > > > > >> entity='osd.26' cmd=[{"prefix": "osd crush
> >> >> > > > > > > > >> create-or-move",
> >> >> > > > "args":
> >> >> > > > > > > > >> ["host=ceph3", "root=default"], "id": 26, "weight":
> >> 1.82}]:
> >> >>
> >> >> > > > > > > > >> dispatch
> >> >> > > > > > > > >>
> >> >> > > > > > > > >>
> >> >> > > > > > > > >> This is a cluster of 3 nodes with 36 OSD's, nodes are
> >> >> > > > > > > > >> also mons and mds's to save servers. All run Ubuntu
> >> >> 14.04.2.
> >> >> > > > > > > > >>
> >> >> > > > > > > > >> I have pretty much tried everything I could think of.
> >> >> > > > > > > > >>
> >> >> > > > > > > > >> Restarting daemons doesn't help.
> >> >> > > > > > > > >>
> >> >> > > > > > > > >> Any help would be appreciated. I can also provide more
> >> >> > > > > > > > >> logs if necessary. They just seem to get pretty large
> >> >> > > > > > > > >> in few
> >> >> > > moments.
> >> >> > > > > > > > >>
> >> >> > > > > > > > >> Thank you
> >> >> > > > > > > > >> Tuomas
> >> >> > > > > > > > >>
> >> >> > > > > > > > >>
> >> >> > > > > > > > >> _______________________________________________
> >> >> > > > > > > > >> ceph-users mailing list ceph-users@xxxxxxxxxxxxxx
> >> >> > > > > > > > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >> >> > > > > > > > >>
> >> >> > > > > > > > >>
> >> >> > > > > > > > >>
> >> >> > > > > > > > >
> >> >> > > > > > > > >
> >> >> > > > > > > > > _______________________________________________
> >> >> > > > > > > > > ceph-users mailing list
> >> >> > > > > > > > > ceph-users@xxxxxxxxxxxxxx
> >> >> > > > > > > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >> >> > > > > > > > >
> >> >> > > > > > > > >
> >> >> > > > > > > > >
> >> >> > > > > > > >
> >> >> > > > > > > >
> >> >> > > > > > > > _______________________________________________
> >> >> > > > > > > > ceph-users mailing list
> >> >> > > > > > > > ceph-users@xxxxxxxxxxxxxx
> >> >> > > > > > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >> >> > > > > > > >
> >> >> > > > > > > >
> >> >> > > > > > > >
> >> >> > > > > > > > _______________________________________________
> >> >> > > > > > > > ceph-users mailing list
> >> >> > > > > > > > ceph-users@xxxxxxxxxxxxxx
> >> >> > > > > > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >> >> > > > > > > >
> >> >> > > > > > > >
> >> >> > > > > > >
> >> >> > > > > >
> >> >> > > > > >
> >> >> > > > >
> >> >> > > > >
> >> >> > > >
> >> >> > >
> >> >> > >
> >> >> > _______________________________________________
> >> >> > ceph-users mailing list
> >> >> > ceph-users@xxxxxxxxxxxxxx
> >> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >> >> >
> >> >> >
> >> >>
> >> >
> >>
> >>
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >> the body of a message to majordomo@xxxxxxxxxxxxxxx
> >> More majordomo info atÂ  http://vger.kernel.org/majordomo-info.html
> >>
> >>
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@xxxxxxxxxxxxxx
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> 
> 
> 
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com