Re: Upgrade from Giant to Hammer and after some basic operations most of the OSD's went down

Sage Weil <sweil@xxxxxxxxxx> · Mon, 27 Apr 2015 10:44:31 -0700 (PDT)

Yeah, no snaps:

images:
            "snap_mode": "selfmanaged",
            "snap_seq": 0,
            "snap_epoch": 17882,
            "pool_snaps": [],
            "removed_snaps": "[]",

img:
            "snap_mode": "selfmanaged",
            "snap_seq": 0,
            "snap_epoch": 0,
            "pool_snaps": [],
            "removed_snaps": "[]",

...and actually the log shows this happens on pool 2 (rbd), which has

            "snap_mode": "selfmanaged",
            "snap_seq": 0,
            "snap_epoch": 0,
            "pool_snaps": [],
            "removed_snaps": "[]",

I'm guessin gthe offending code is

    pi->build_removed_snaps(newly_removed_snaps);
    newly_removed_snaps.subtract(cached_removed_snaps);

so newly_removed_snaps should be empty, and apparently 
cached_removed_snaps is not?  Maybe one of your older osdmaps has snap 
info for rbd?  It doesn't make sense.  :/  Maybe

 ceph osd dump 18127 -f json-pretty

just to be certain?  I've pushed a branch 'wip-hammer-snaps' that 
will appear at gitbuilder.ceph.com in 20-30 minutes that will output some 
additional debug info.  It will be at

	http://gitbuilder.ceph.com/ceph-deb-trusty-x86_64-basic/ref/wip-hammer-sanps

or similar, depending on your distro.  Can you install it one on node 
and start and osd with logging to reproduce the crash?

Thanks!
sage

On Mon, 27 Apr 2015, Tuomas Juntunen wrote:

> Hi
> 
> Here you go
> 
> Br,
> Tuomas
> 
> 
> 
> -----Original Message-----
> From: Sage Weil [mailto:sweil@xxxxxxxxxx] 
> Sent: 27. huhtikuuta 2015 19:23
> To: Tuomas Juntunen
> Cc: 'Samuel Just'; ceph-users@xxxxxxxxxxxxxx
> Subject: Re:  Upgrade from Giant to Hammer and after some basic
> operations most of the OSD's went down
> 
> On Mon, 27 Apr 2015, Tuomas Juntunen wrote:
> > Thanks for the info.
> > 
> > For my knowledge there was no snapshots on that pool, but cannot 
> > verify that.
> 
> Can you attach a 'ceph osd dump -f json-pretty'?  That will shed a bit more
> light on what happened (and the simplest way to fix it).
> 
> sage
> 
> 
> > Any way to make this work again? Removing the tier and other settings 
> > didn't fix it, I tried it the second this happened.
> > 
> > Br,
> > Tuomas
> > 
> > -----Original Message-----
> > From: Samuel Just [mailto:sjust@xxxxxxxxxx]
> > Sent: 27. huhtikuuta 2015 15:50
> > To: tuomas juntunen
> > Cc: ceph-users@xxxxxxxxxxxxxx
> > Subject: Re:  Upgrade from Giant to Hammer and after some 
> > basic operations most of the OSD's went down
> > 
> > So, the base tier is what determines the snapshots for the cache/base pool
> amalgam.  You added a populated pool complete with snapshots on top of a
> base tier without snapshots.  Apparently, it caused an existential crisis
> for the snapshot code.  That's one of the reasons why there is a
> --force-nonempty flag for that operation, I think.  I think the immediate
> answer is probably to disallow pools with snapshots as a cache tier
> altogether until we think of a good way to make it work.
> > -Sam
> > 
> > ----- Original Message -----
> > From: "tuomas juntunen" <tuomas.juntunen@xxxxxxxxxxxxxxx>
> > To: "Samuel Just" <sjust@xxxxxxxxxx>
> > Cc: ceph-users@xxxxxxxxxxxxxx
> > Sent: Monday, April 27, 2015 4:56:58 AM
> > Subject: Re:  Upgrade from Giant to Hammer and after some 
> > basic operations most of the OSD's went down
> > 
> > 
> > 
> > The following:
> > 
> > ceph osd tier add img images --force-nonempty ceph osd tier cache-mode 
> > images forward ceph osd tier set-overlay img images
> > 
> > Idea was to make images as a tier to img, move data to img then change
> clients to use the new img pool.
> > 
> > Br,
> > Tuomas
> > 
> > > Can you explain exactly what you mean by:
> > >
> > > "Also I created one pool for tier to be able to move data without
> outage."
> > >
> > > -Sam
> > > ----- Original Message -----
> > > From: "tuomas juntunen" <tuomas.juntunen@xxxxxxxxxxxxxxx>
> > > To: "Ian Colle" <icolle@xxxxxxxxxx>
> > > Cc: ceph-users@xxxxxxxxxxxxxx
> > > Sent: Monday, April 27, 2015 4:23:44 AM
> > > Subject: Re:  Upgrade from Giant to Hammer and after 
> > > some basic operations most of the OSD's went down
> > >
> > > Hi
> > >
> > > Any solution for this yet?
> > >
> > > Br,
> > > Tuomas
> > >
> > >> It looks like you may have hit http://tracker.ceph.com/issues/7915
> > >>
> > >> Ian R. Colle
> > >> Global Director
> > >> of Software Engineering
> > >> Red Hat (Inktank is now part of Red Hat!) 
> > >> http://www.linkedin.com/in/ircolle
> > >> http://www.twitter.com/ircolle
> > >> Cell: +1.303.601.7713
> > >> Email: icolle@xxxxxxxxxx
> > >>
> > >> ----- Original Message -----
> > >> From: "tuomas juntunen" <tuomas.juntunen@xxxxxxxxxxxxxxx>
> > >> To: ceph-users@xxxxxxxxxxxxxx
> > >> Sent: Monday, April 27, 2015 1:56:29 PM
> > >> Subject:  Upgrade from Giant to Hammer and after some 
> > >> basic operations most of the OSD's went down
> > >>
> > >>
> > >>
> > >> I upgraded Ceph from 0.87 Giant to 0.94.1 Hammer
> > >>
> > >> Then created new pools and deleted some old ones. Also I created 
> > >> one pool for tier to be able to move data without outage.
> > >>
> > >> After these operations all but 10 OSD's are down and creating this 
> > >> kind of messages to logs, I get more than 100gb of these in a night:
> > >>
> > >>  -19> 2015-04-27 10:17:08.808584 7fd8e748d700  5 osd.23 pg_epoch: 
> > >> 17882 pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609 n=0 ec=1 
> > >> les/c
> > >> 16609/16659
> > >> 16590/16590/16590) [24,3,23] r=2 lpr=17838 pi=15659-16589/42
> > >> crt=8480'7 lcod
> > >> 0'0 inactive NOTIFY] enter Started
> > >>    -18> 2015-04-27 10:17:08.808596 7fd8e748d700  5 osd.23 pg_epoch: 
> > >> 17882 pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609 n=0 ec=1 
> > >> les/c
> > >> 16609/16659
> > >> 16590/16590/16590) [24,3,23] r=2 lpr=17838 pi=15659-16589/42
> > >> crt=8480'7 lcod
> > >> 0'0 inactive NOTIFY] enter Start
> > >>    -17> 2015-04-27 10:17:08.808608 7fd8e748d700  1 osd.23 pg_epoch: 
> > >> 17882 pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609 n=0 ec=1 
> > >> les/c
> > >> 16609/16659
> > >> 16590/16590/16590) [24,3,23] r=2 lpr=17838 pi=15659-16589/42
> > >> crt=8480'7 lcod
> > >> 0'0 inactive NOTIFY] state<Start>: transitioning to Stray
> > >>    -16> 2015-04-27 10:17:08.808621 7fd8e748d700  5 osd.23 pg_epoch: 
> > >> 17882 pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609 n=0 ec=1 
> > >> les/c
> > >> 16609/16659
> > >> 16590/16590/16590) [24,3,23] r=2 lpr=17838 pi=15659-16589/42
> > >> crt=8480'7 lcod
> > >> 0'0 inactive NOTIFY] exit Start 0.000025 0 0.000000
> > >>    -15> 2015-04-27 10:17:08.808637 7fd8e748d700  5 osd.23 pg_epoch: 
> > >> 17882 pg[0.189( v 8480'7 (0'0,8480'7] local-les=16609 n=0 ec=1 
> > >> les/c
> > >> 16609/16659
> > >> 16590/16590/16590) [24,3,23] r=2 lpr=17838 pi=15659-16589/42
> > >> crt=8480'7 lcod
> > >> 0'0 inactive NOTIFY] enter Started/Stray
> > >>    -14> 2015-04-27 10:17:08.808796 7fd8e748d700  5 osd.23 pg_epoch: 
> > >> 17882 pg[10.181( empty local-les=17879 n=0 ec=17863 les/c 
> > >> 17879/17879
> > >> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0 inactive NOTIFY] 
> > >> exit Reset 0.119467 4 0.000037
> > >>    -13> 2015-04-27 10:17:08.808817 7fd8e748d700  5 osd.23 pg_epoch: 
> > >> 17882 pg[10.181( empty local-les=17879 n=0 ec=17863 les/c 
> > >> 17879/17879
> > >> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0 inactive NOTIFY] 
> > >> enter Started
> > >>    -12> 2015-04-27 10:17:08.808828 7fd8e748d700  5 osd.23 pg_epoch: 
> > >> 17882 pg[10.181( empty local-les=17879 n=0 ec=17863 les/c 
> > >> 17879/17879
> > >> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0 inactive NOTIFY] 
> > >> enter Start
> > >>    -11> 2015-04-27 10:17:08.808838 7fd8e748d700  1 osd.23 pg_epoch: 
> > >> 17882 pg[10.181( empty local-les=17879 n=0 ec=17863 les/c 
> > >> 17879/17879
> > >> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0 inactive NOTIFY]
> > >> state<Start>: transitioning to Stray
> > >>    -10> 2015-04-27 10:17:08.808849 7fd8e748d700  5 osd.23 pg_epoch: 
> > >> 17882 pg[10.181( empty local-les=17879 n=0 ec=17863 les/c 
> > >> 17879/17879
> > >> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0 inactive NOTIFY] 
> > >> exit Start 0.000020 0 0.000000
> > >>     -9> 2015-04-27 10:17:08.808861 7fd8e748d700  5 osd.23 pg_epoch: 
> > >> 17882 pg[10.181( empty local-les=17879 n=0 ec=17863 les/c 
> > >> 17879/17879
> > >> 17863/17863/17863) [25,5,23] r=2 lpr=17879 crt=0'0 inactive NOTIFY] 
> > >> enter Started/Stray
> > >>     -8> 2015-04-27 10:17:08.809427 7fd8e748d700  5 osd.23 pg_epoch: 
> > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c 16127/16344
> > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod 0'0 inactive] 
> > >> exit Reset 7.511623 45 0.000165
> > >>     -7> 2015-04-27 10:17:08.809445 7fd8e748d700  5 osd.23 pg_epoch: 
> > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c 16127/16344
> > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod 0'0 inactive] 
> > >> enter Started
> > >>     -6> 2015-04-27 10:17:08.809456 7fd8e748d700  5 osd.23 pg_epoch: 
> > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c 16127/16344
> > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod 0'0 inactive] 
> > >> enter Start
> > >>     -5> 2015-04-27 10:17:08.809468 7fd8e748d700  1 osd.23 pg_epoch: 
> > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c 16127/16344
> > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod 0'0 inactive]
> > >> state<Start>: transitioning to Primary
> > >>     -4> 2015-04-27 10:17:08.809479 7fd8e748d700  5 osd.23 pg_epoch: 
> > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c 16127/16344
> > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod 0'0 inactive] 
> > >> exit Start 0.000023 0 0.000000
> > >>     -3> 2015-04-27 10:17:08.809492 7fd8e748d700  5 osd.23 pg_epoch: 
> > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c 16127/16344
> > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod 0'0 inactive] 
> > >> enter Started/Primary
> > >>     -2> 2015-04-27 10:17:08.809502 7fd8e748d700  5 osd.23 pg_epoch: 
> > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c 16127/16344
> > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod 0'0 inactive] 
> > >> enter Started/Primary/Peering
> > >>     -1> 2015-04-27 10:17:08.809513 7fd8e748d700  5 osd.23 pg_epoch: 
> > >> 17882 pg[2.189( empty local-les=16127 n=0 ec=1 les/c 16127/16344
> > >> 16125/16125/16125) [23,5] r=0 lpr=17838 crt=0'0 mlcod 0'0 peering] 
> > >> enter Started/Primary/Peering/GetInfo
> > >>      0> 2015-04-27 10:17:08.813837 7fd8e748d700 -1
> ./include/interval_set.h:
> > >> In
> > >> function 'void interval_set<T>::erase(T, T) [with T = snapid_t]' 
> > >> thread
> > >> 7fd8e748d700 time 2015-04-27 10:17:08.809899
> > >> ./include/interval_set.h: 385: FAILED assert(_size >= 0)
> > >>
> > >>  ceph version 0.94.1 (e4bfad3a3c51054df7e537a724c8d0bf9be972ff)
> > >>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> > >> const*)+0x8b)
> > >> [0xbc271b]
> > >>  2: (interval_set<snapid_t>::subtract(interval_set<snapid_t>
> > >> const&)+0xb0) [0x82cd50]
> > >>  3: (PGPool::update(std::tr1::shared_ptr<OSDMap const>)+0x52e) 
> > >> [0x80113e]
> > >>  4: (PG::handle_advance_map(std::tr1::shared_ptr<OSDMap const>, 
> > >> std::tr1::shared_ptr<OSDMap const>, std::vector<int, 
> > >> std::allocator<int> >&, int, std::vector<int, std::allocator<int> 
> > >> >&, int, PG::RecoveryCtx*)+0x282) [0x801652]
> > >>  5: (OSD::advance_pg(unsigned int, PG*, ThreadPool::TPHandle&, 
> > >> PG::RecoveryCtx*, std::set<boost::intrusive_ptr<PG>,
> > >> std::less<boost::intrusive_ptr<PG> >, 
> > >> std::allocator<boost::intrusive_ptr<PG> > >*)+0x2c3) [0x6b0e43]
> > >>  6: (OSD::process_peering_events(std::list<PG*, std::allocator<PG*> 
> > >> > const&,
> > >> ThreadPool::TPHandle&)+0x21c) [0x6b191c]
> > >>  7: (OSD::PeeringWQ::_process(std::list<PG*, std::allocator<PG*> > 
> > >> const&,
> > >> ThreadPool::TPHandle&)+0x18) [0x709278]
> > >>  8: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa5e) [0xbb38ae]
> > >>  9: (ThreadPool::WorkThread::entry()+0x10) [0xbb4950]
> > >>  10: (()+0x8182) [0x7fd906946182]
> > >>  11: (clone()+0x6d) [0x7fd904eb147d]
> > >>
> > >> Also by monitoring (ceph -w) I get the following messages, also lots of
> them.
> > >>
> > >> 2015-04-27 10:39:52.935812 mon.0 [INF] from='client.?
> 10.20.0.13:0/1174409'
> > >> entity='osd.30' cmd=[{"prefix": "osd crush create-or-move", "args":
> > >> ["host=ceph3", "root=default"], "id": 30, "weight": 1.82}]: 
> > >> dispatch
> > >> 2015-04-27 10:39:53.297376 mon.0 [INF] from='client.?
> 10.20.0.13:0/1174483'
> > >> entity='osd.26' cmd=[{"prefix": "osd crush create-or-move", "args":
> > >> ["host=ceph3", "root=default"], "id": 26, "weight": 1.82}]: 
> > >> dispatch
> > >>
> > >>
> > >> This is a cluster of 3 nodes with 36 OSD's, nodes are also mons and 
> > >> mds's to save servers. All run Ubuntu 14.04.2.
> > >>
> > >> I have pretty much tried everything I could think of.
> > >>
> > >> Restarting daemons doesn't help.
> > >>
> > >> Any help would be appreciated. I can also provide more logs if 
> > >> necessary. They just seem to get pretty large in few moments.
> > >>
> > >> Thank you
> > >> Tuomas
> > >>
> > >>
> > >> _______________________________________________
> > >> ceph-users mailing list
> > >> ceph-users@xxxxxxxxxxxxxx
> > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > >>
> > >>
> > >>
> > >
> > >
> > > _______________________________________________
> > > ceph-users mailing list
> > > ceph-users@xxxxxxxxxxxxxx
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > >
> > >
> > >
> > 
> > 
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@xxxxxxxxxxxxxx
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > 
> > 
> > 
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@xxxxxxxxxxxxxx
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > 
> > 
> 
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com