Re: Upgrade from Giant to Hammer and after some basic operations most of the OSD's went down

Sage Weil <sweil@xxxxxxxxxx> · Tue, 28 Apr 2015 10:01:31 -0700 (PDT)

On Tue, 28 Apr 2015, Tuomas Juntunen wrote:
> Just to add some more interesting behavior to my problem, is that monitors
> are not updating the status of OSD's.

Yeah, this is very strange.  I see

2015-04-27 22:25:26.142245 7f78d793a900 10 osd.15 17882 _get_pool 4 
cached_removed_snaps [1~1,4~a,f~2,12~10,23~4,28~2,2e~2,32~c,3f~4]

but pool 4 (images) has no snaps and a zeroed snap_seq in the osdmap dump 
you provided.  Can you attach

 ceph osd dump -f json-pretty 17882

?

The fact that your mon status isn't changing makes me wonder if these are 
two different ceph clusters or something similarly silly...  if the mds 
daemons are down the status should show stale.  If all osds stop you might 
not see the osds go down for ~10 minutes (since they aren't monitoring 
each other).

sage

> 
> Even when I stop all the remaining OSD's, ceph osd tree shows them as up.
> Also there the status of mons and mds doesn't seem to update correctly in my
> opinion.
> 
> Below is a copy of status when one mon and mds are stopped and all of the
> osd's are also stopped.
> 
>      monmap e7: 3 mons at
> {ceph1=10.20.0.11:6789/0,ceph2=10.20.0.12:6789/0,ceph3=10.20.0.13:6789/0}
>             election epoch 48, quorum 0,1 ceph1,ceph2
>      mdsmap e1750: 1/1/1 up {0=ceph2=up:replay}, 1 up:standby
>      osdmap e18132: 37 osds: 11 up, 11 in
> 
> Br,
> Tuomas
> 
> 
> -----Original Message-----
> From: Sage Weil [mailto:sweil@xxxxxxxxxx] 
> Sent: 27. huhtikuuta 2015 22:22
> To: Tuomas Juntunen
> Cc: ceph-users@xxxxxxxxxxxxxx
> Subject: RE:  Upgrade from Giant to Hammer and after some basic
> operations most of the OSD's went down
> 
> On Mon, 27 Apr 2015, Tuomas Juntunen wrote:
> > Hey
> > 
> > Got the log, you can get it from
> > http://beta.xaasbox.com/ceph/ceph-osd.15.log
> 
> Can you repeat this with 'debug osd = 20'?  Thanks!
> 
> sage
> 
> > 
> > Br,
> > Tuomas
> > 
> > 
> > -----Original Message-----
> > From: Sage Weil [mailto:sweil@xxxxxxxxxx]
> > Sent: 27. huhtikuuta 2015 20:45
> > To: Tuomas Juntunen
> > Cc: ceph-users@xxxxxxxxxxxxxx
> > Subject: Re:  Upgrade from Giant to Hammer and after some 
> > basic operations most of the OSD's went down
> > 
> > Yeah, no snaps:
> > 
> > images:
> >             "snap_mode": "selfmanaged",
> >             "snap_seq": 0,
> >             "snap_epoch": 17882,
> >             "pool_snaps": [],
> >             "removed_snaps": "[]",
> > 
> > img:
> >             "snap_mode": "selfmanaged",
> >             "snap_seq": 0,
> >             "snap_epoch": 0,
> >             "pool_snaps": [],
> >             "removed_snaps": "[]",
> > 
> > ...and actually the log shows this happens on pool 2 (rbd), which has
> > 
> >             "snap_mode": "selfmanaged",
> >             "snap_seq": 0,
> >             "snap_epoch": 0,
> >             "pool_snaps": [],
> >             "removed_snaps": "[]",
> > 
> > I'm guessin gthe offending code is
> > 
> >     pi->build_removed_snaps(newly_removed_snaps);
> >     newly_removed_snaps.subtract(cached_removed_snaps);
> > 
> > so newly_removed_snaps should be empty, and apparently 
> > cached_removed_snaps is not?  Maybe one of your older osdmaps has snap 
> > info for rbd?  It doesn't make sense.  :/  Maybe
> > 
> >  ceph osd dump 18127 -f json-pretty
> > 
> > just to be certain?  I've pushed a branch 'wip-hammer-snaps' that will 
> > appear at gitbuilder.ceph.com in 20-30 minutes that will output some 
> > additional debug info.  It will be at
> > 
> > 	
> > http://gitbuilder.ceph.com/ceph-deb-trusty-x86_64-basic/ref/wip-hammer
> > -sanps
> > 
> > or similar, depending on your distro.  Can you install it one on node 
> > and start and osd with logging to reproduce the crash?
> > 
> > Thanks!
> > sage
> > 
> > 
> > On Mon, 27 Apr 2015, Tuomas Juntunen wrote:
> > 
> > > Hi
> > > 
> > > Here you go
> > > 
> > > Br,
> > > Tuomas
> > > 
> > > 
> > > 
> > > -----Original Message-----
> > > From: Sage Weil [mailto:sweil@xxxxxxxxxx]
> > > Sent: 27. huhtikuuta 2015 19:23
> > > To: Tuomas Juntunen
> > > Cc: 'Samuel Just'; ceph-users@xxxxxxxxxxxxxx
> > > Subject: Re:  Upgrade from Giant to Hammer and after 
> > > some basic operations most of the OSD's went down
> > > 
> > > On Mon, 27 Apr 2015, Tuomas Juntunen wrote:
> > > > Thanks for the info.
> > > > 
> > > > For my knowledge there was no snapshots on that pool, but cannot 
> > > > verify that.
> > > 
> > > Can you attach a 'ceph osd dump -f json-pretty'?  That will shed a 
> > > bit more light on what happened (and the simplest way to fix it).
> > > 
> > > sage
> > > 
> > > 
> > > > Any way to make this work again? Removing the tier and other 
> > > > settings didn't fix it, I tried it the second this happened.
> > > > 
> > > > Br,
> > > > Tuomas
> > > > 
> > > > -----Original Message-----
> > > > From: Samuel Just [mailto:sjust@xxxxxxxxxx]
> > > > Sent: 27. huhtikuuta 2015 15:50
> > > > To: tuomas juntunen
> > > > Cc: ceph-users@xxxxxxxxxxxxxx
> > > > Subject: Re:  Upgrade from Giant to Hammer and after 
> > > > some basic operations most of the OSD's went down
> > > > 
> > > > So, the base tier is what determines the snapshots for the 
> > > > cache/base pool
> > > amalgam.  You added a populated pool complete with snapshots on top 
> > > of a base tier without snapshots.  Apparently, it caused an 
> > > existential crisis for the snapshot code.  That's one of the reasons 
> > > why there is a --force-nonempty flag for that operation, I think.  I 
> > > think the immediate answer is probably to disallow pools with 
> > > snapshots as a cache tier altogether until we think of a good way to
> make it work.
> > > > -Sam
> > > > 
> > > > ----- Original Message -----
> > > > From: "tuomas juntunen" <tuomas.juntunen@xxxxxxxxxxxxxxx>
> > > > To: "Samuel Just" <sjust@xxxxxxxxxx>
> > > > Cc: ceph-users@xxxxxxxxxxxxxx
> > > > Sent: Monday, April 27, 2015 4:56:58 AM
> > > > Subject: Re:  Upgrade from Giant to Hammer and after 
> > > > some basic operations most of the OSD's went down
> > > > 
> > > > 
> > > > 
> > > > The following:
> > > > 
> > > > ceph osd tier add img images --force-nonempty ceph osd tier 
> > > > cache-mode images forward ceph osd tier set-overlay img images
> > > > 
> > > > Idea was to make images as a tier to img, move data to img then 
> > > > change
> > > clients to use the new img pool.
> > > > 
> > > > Br,
> > > > Tuomas
> > > > 
> > > > > Can you explain exactly what you mean by:
> > > > >
> > > > > "Also I created one pool for tier to be able to move data 
> > > > > without
> > > outage."
> > > > >
> > > > > -Sam
> > > > > ----- Original Message -----
> > > > > From: "tuomas juntunen" <tuomas.juntunen@xxxxxxxxxxxxxxx>
> > > > > To: "Ian Colle" <icolle@xxxxxxxxxx>
> > > > > Cc: ceph-users@xxxxxxxxxxxxxx
> > > > > Sent: Monday, April 27, 2015 4:23:44 AM
> > > > > Subject: Re:  Upgrade from Giant to Hammer and after 
> > > > > some basic operations most of the OSD's went down
> > > > >
> > > > > Hi
> > > > >
> > > > > Any solution for this yet?
> > > > >
> > > > > Br,
> > > > > Tuomas
> > > > >
> > > > >> It looks like you may have hit
> > > > >> http://tracker.ceph.com/issues/7915
> > > > >>
> > > > >> Ian R. Colle
> > > > >> Global Director
> > > > >> of Software Engineering
> > > > >> Red Hat (Inktank is now part of Red Hat!) 
> > > > >> http://www.linkedin.com/in/ircolle
> > > > >> http://www.twitter.com/ircolle
> > > > >> Cell: +1.303.601.7713
> > > > >> Email: icolle@xxxxxxxxxx
> > > > >>
> > > > >> ----- Original Message -----
> > > > >> From: "tuomas juntunen" <tuomas.juntunen@xxxxxxxxxxxxxxx>
> > > > >> To: ceph-users@xxxxxxxxxxxxxx
> > > > >> Sent: Monday, April 27, 2015 1:56:29 PM
> > > > >> Subject:  Upgrade from Giant to Hammer and after 
> > > > >> some basic operations most of the OSD's went down
> > > > >>
> > > > >>
> > > > >>
> > > > >> I upgraded Ceph from 0.87 Giant to 0.94.1 Hammer
> > > > >>
> > > > >> Then created new pools and deleted some old ones. Also I 
> > > > >> created one pool for tier to be able to move data without outage.
> > > > >>
> > > > >> After these operations all but 10 OSD's are down and creating 
> > > > >> this kind of messages to logs, I get more than 100gb of these 
> > > > >> in a
> > night:
> > > > >>
> > > > >>  -19> 2015-04-27 10:17:08.808584 7fd8e748d700  5 osd.23 pg_epoch: 
> > > > >> 17882 pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609 n=0 ec=1 
> > > > >> les/c
> > > > >> 16609/16659
> > > > >> 16590/16590/16590) [24,3,23] r=2 lpr=17838 pi=15659-16589/42
> > > > >> crt=8480'7 lcod
> > > > >> 0'0 inactive NOTIFY] enter Started
> > > > >>    -18> 2015-04-27 10:17:08.808596 7fd8e748d700  5 osd.23 pg_epoch:
> 
> > > > >> 17882 pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609 n=0 ec=1 
> > > > >> les/c
> > > > >> 16609/16659
> > > > >> 16590/16590/16590) [24,3,23] r=2 lpr=17838 pi=15659-16589/42
> > > > >> crt=8480'7 lcod
> > > > >> 0'0 inactive NOTIFY] enter Start
> > > > >>    -17> 2015-04-27 10:17:08.808608 7fd8e748d700  1 osd.23 pg_epoch:
> 
> > > > >> 17882 pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609 n=0 ec=1 
> > > > >> les/c
> > > > >> 16609/16659
> > > > >> 16590/16590/16590) [24,3,23] r=2 lpr=17838 pi=15659-16589/42
> > > > >> crt=8480'7 lcod
> > > > >> 0'0 inactive NOTIFY] state<Start>: transitioning to Stray
> > > > >>    -16> 2015-04-27 10:17:08.808621 7fd8e748d700  5 osd.23 pg_epoch:
> 
> > > > >> 17882 pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609 n=0 ec=1 
> > > > >> les/c
> > > > >> 16609/16659
> > > > >> 16590/16590/16590) [24,3,23] r=2 lpr=17838 pi=15659-16589/42
> > > > >> crt=8480'7 lcod
> > > > >> 0'0 inactive NOTIFY] exit Start 0.000025 0 0.000000
> > > > >>    -15> 2015-04-27 10:17:08.808637 7fd8e748d700  5 osd.23 pg_epoch:
> 
> > > > >> 17882 pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609 n=0 ec=1 
> > > > >> les/c
> > > > >> 16609/16659
> > > > >> 16590/16590/16590) [24,3,23] r=2 lpr=17838 pi=15659-16589/42
> > > > >> crt=8480'7 lcod
> > > > >> 0'0 inactive NOTIFY] enter Started/Stray
> > > > >>    -14> 2015-04-27 10:17:08.808796 7fd8e748d700  5 osd.23 pg_epoch:
> 
> > > > >> 17882 pg[10.181( empty local-les=17879 n=0 ec=17863 les/c
> > > > >> 17879/17879
> > > > >> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0 inactive 
> > > > >> NOTIFY] exit Reset 0.119467 4 0.000037
> > > > >>    -13> 2015-04-27 10:17:08.808817 7fd8e748d700  5 osd.23 pg_epoch:
> 
> > > > >> 17882 pg[10.181( empty local-les=17879 n=0 ec=17863 les/c
> > > > >> 17879/17879
> > > > >> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0 inactive 
> > > > >> NOTIFY] enter Started
> > > > >>    -12> 2015-04-27 10:17:08.808828 7fd8e748d700  5 osd.23 pg_epoch:
> 
> > > > >> 17882 pg[10.181( empty local-les=17879 n=0 ec=17863 les/c
> > > > >> 17879/17879
> > > > >> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0 inactive 
> > > > >> NOTIFY] enter Start
> > > > >>    -11> 2015-04-27 10:17:08.808838 7fd8e748d700  1 osd.23 pg_epoch:
> 
> > > > >> 17882 pg[10.181( empty local-les=17879 n=0 ec=17863 les/c
> > > > >> 17879/17879
> > > > >> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0 inactive 
> > > > >> NOTIFY]
> > > > >> state<Start>: transitioning to Stray
> > > > >>    -10> 2015-04-27 10:17:08.808849 7fd8e748d700  5 osd.23 pg_epoch:
> 
> > > > >> 17882 pg[10.181( empty local-les=17879 n=0 ec=17863 les/c
> > > > >> 17879/17879
> > > > >> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0 inactive 
> > > > >> NOTIFY] exit Start 0.000020 0 0.000000
> > > > >>     -9> 2015-04-27 10:17:08.808861 7fd8e748d700  5 osd.23 pg_epoch:
> 
> > > > >> 17882 pg[10.181( empty local-les=17879 n=0 ec=17863 les/c
> > > > >> 17879/17879
> > > > >> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0 inactive 
> > > > >> NOTIFY] enter Started/Stray
> > > > >>     -8> 2015-04-27 10:17:08.809427 7fd8e748d700  5 osd.23 pg_epoch:
> 
> > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c 
> > > > >> 16127/16344
> > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod 0'0 
> > > > >> inactive] exit Reset 7.511623 45 0.000165
> > > > >>     -7> 2015-04-27 10:17:08.809445 7fd8e748d700  5 osd.23 pg_epoch:
> 
> > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c 
> > > > >> 16127/16344
> > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod 0'0 
> > > > >> inactive] enter Started
> > > > >>     -6> 2015-04-27 10:17:08.809456 7fd8e748d700  5 osd.23 pg_epoch:
> 
> > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c 
> > > > >> 16127/16344
> > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod 0'0 
> > > > >> inactive] enter Start
> > > > >>     -5> 2015-04-27 10:17:08.809468 7fd8e748d700  1 osd.23 pg_epoch:
> 
> > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c 
> > > > >> 16127/16344
> > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod 0'0 
> > > > >> inactive]
> > > > >> state<Start>: transitioning to Primary
> > > > >>     -4> 2015-04-27 10:17:08.809479 7fd8e748d700  5 osd.23 pg_epoch:
> 
> > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c 
> > > > >> 16127/16344
> > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod 0'0 
> > > > >> inactive] exit Start 0.000023 0 0.000000
> > > > >>     -3> 2015-04-27 10:17:08.809492 7fd8e748d700  5 osd.23 pg_epoch:
> 
> > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c 
> > > > >> 16127/16344
> > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod 0'0 
> > > > >> inactive] enter Started/Primary
> > > > >>     -2> 2015-04-27 10:17:08.809502 7fd8e748d700  5 osd.23 pg_epoch:
> 
> > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c 
> > > > >> 16127/16344
> > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod 0'0 
> > > > >> inactive] enter Started/Primary/Peering
> > > > >>     -1> 2015-04-27 10:17:08.809513 7fd8e748d700  5 osd.23 pg_epoch:
> 
> > > > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c 
> > > > >> 16127/16344
> > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod 0'0 
> > > > >> peering] enter Started/Primary/Peering/GetInfo
> > > > >>      0> 2015-04-27 10:17:08.813837 7fd8e748d700 -1
> > > ./include/interval_set.h:
> > > > >> In
> > > > >> function 'void interval_set<T>::erase(T, T) [with T = snapid_t]' 
> > > > >> thread
> > > > >> 7fd8e748d700 time 2015-04-27 10:17:08.809899
> > > > >> ./include/interval_set.h: 385: FAILED assert(_size >= 0)
> > > > >>
> > > > >>  ceph version 0.94.1 (e4bfad3a3c51054df7e537a724c8d0bf9be972ff)
> > > > >>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, 
> > > > >> char
> > > > >> const*)+0x8b)
> > > > >> [0xbc271b]
> > > > >>  2: (interval_set<snapid_t>::subtract(interval_set<snapid_t>
> > > > >> const&)+0xb0) [0x82cd50]
> > > > >>  3: (PGPool::update(std::tr1::shared_ptr<OSDMap const>)+0x52e) 
> > > > >> [0x80113e]
> > > > >>  4: (PG::handle_advance_map(std::tr1::shared_ptr<OSDMap const>, 
> > > > >> std::tr1::shared_ptr<OSDMap const>, std::vector<int, 
> > > > >> std::allocator<int> >&, int, std::vector<int, 
> > > > >> std::allocator<int>
> > > > >> >&, int, PG::RecoveryCtx*)+0x282) [0x801652]
> > > > >>  5: (OSD::advance_pg(unsigned int, PG*, ThreadPool::TPHandle&, 
> > > > >> PG::RecoveryCtx*, std::set<boost::intrusive_ptr<PG>,
> > > > >> std::less<boost::intrusive_ptr<PG> >, 
> > > > >> std::allocator<boost::intrusive_ptr<PG> > >*)+0x2c3) [0x6b0e43]
> > > > >>  6: (OSD::process_peering_events(std::list<PG*,
> > > > >> std::allocator<PG*>
> > > > >> > const&,
> > > > >> ThreadPool::TPHandle&)+0x21c) [0x6b191c]
> > > > >>  7: (OSD::PeeringWQ::_process(std::list<PG*, 
> > > > >> std::allocator<PG*>
> > > > >> > const&,
> > > > >> ThreadPool::TPHandle&)+0x18) [0x709278]
> > > > >>  8: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa5e)
> > > > >> [0xbb38ae]
> > > > >>  9: (ThreadPool::WorkThread::entry()+0x10) [0xbb4950]
> > > > >>  10: (()+0x8182) [0x7fd906946182]
> > > > >>  11: (clone()+0x6d) [0x7fd904eb147d]
> > > > >>
> > > > >> Also by monitoring (ceph -w) I get the following messages, also 
> > > > >> lots of
> > > them.
> > > > >>
> > > > >> 2015-04-27 10:39:52.935812 mon.0 [INF] from='client.?
> > > 10.20.0.13:0/1174409'
> > > > >> entity='osd.30' cmd=[{"prefix": "osd crush create-or-move", "args":
> > > > >> ["host=ceph3", "root=default"], "id": 30, "weight": 1.82}]: 
> > > > >> dispatch
> > > > >> 2015-04-27 10:39:53.297376 mon.0 [INF] from='client.?
> > > 10.20.0.13:0/1174483'
> > > > >> entity='osd.26' cmd=[{"prefix": "osd crush create-or-move", "args":
> > > > >> ["host=ceph3", "root=default"], "id": 26, "weight": 1.82}]: 
> > > > >> dispatch
> > > > >>
> > > > >>
> > > > >> This is a cluster of 3 nodes with 36 OSD's, nodes are also mons 
> > > > >> and mds's to save servers. All run Ubuntu 14.04.2.
> > > > >>
> > > > >> I have pretty much tried everything I could think of.
> > > > >>
> > > > >> Restarting daemons doesn't help.
> > > > >>
> > > > >> Any help would be appreciated. I can also provide more logs if 
> > > > >> necessary. They just seem to get pretty large in few moments.
> > > > >>
> > > > >> Thank you
> > > > >> Tuomas
> > > > >>
> > > > >>
> > > > >> _______________________________________________
> > > > >> ceph-users mailing list
> > > > >> ceph-users@xxxxxxxxxxxxxx
> > > > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > > > >>
> > > > >>
> > > > >>
> > > > >
> > > > >
> > > > > _______________________________________________
> > > > > ceph-users mailing list
> > > > > ceph-users@xxxxxxxxxxxxxx
> > > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > > > >
> > > > >
> > > > >
> > > > 
> > > > 
> > > > _______________________________________________
> > > > ceph-users mailing list
> > > > ceph-users@xxxxxxxxxxxxxx
> > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > > > 
> > > > 
> > > > 
> > > > _______________________________________________
> > > > ceph-users mailing list
> > > > ceph-users@xxxxxxxxxxxxxx
> > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > > > 
> > > > 
> > > 
> > 
> > 
> 
> 
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com