MON segfaulting when setting a crush ruleset to a pool (firefly 0.80.4)

olivier.delhomme@xxxxxxxxxxxxxxxxxx (Olivier DELHOMME) · Thu, 24 Jul 2014 21:29:40 +0200 (CEST)

Hi Joao,

In the meanwhile I have done the following things :

$ ceph osd crush move ceph-osd15 rack=rack1-pdu1
moved item id -17 name 'ceph-osd15' to location {rack=rack1-pdu1} in crush map

$ ceph osd crush rm rack2-pdu3
removed item id -23 name 'rack2-pdu3' from crush map

But it does not solve the problem either.
I saw in the documentation that restarting the osd where the PG are stuck
could help... I did restart all the OSD but it leads to the following status :

    cluster 4a8669b9-b379-43b2-9488-7fca6e1366bc
     health HEALTH_WARN 80 pgs degraded; 152 pgs peering; 411 pgs stale; 166 pgs stuck inactive; 411 pgs stuck stale; 620 pgs stuck unclean; recovery 51106/694410 objects degraded (7.360%)
     monmap e2: 3 mons at {ceph-mon0=10.1.2.1:6789/0,ceph-mon1=10.1.2.2:6789/0,ceph-mon2=10.1.2.3:6789/0}, election epoch 68, quorum 0,1,2 ceph-mon0,ceph-mon1,ceph-mon2
     osdmap e1825: 16 osds: 16 up, 16 in
      pgmap v301798: 712 pgs, 5 pools, 1350 GB data, 338 kobjects
            2763 GB used, 5615 GB / 8379 GB avail
            51106/694410 objects degraded (7.360%)
                 152 stale+peering
                  73 stale+active+remapped
                  80 stale+active+degraded+remapped
                  92 stale+active+clean
                 301 active+remapped
                  14 stale

You'll find my crush map here :

http://pastebin.com/F9aFjcjm

Cheers,

Olivier.

----- Mail original -----
> De: "Joao Eduardo Luis" <joao.luis at inktank.com>
> ?: "Olivier DELHOMME" <olivier.delhomme at mines-paristech.fr>, ceph-users at lists.ceph.com
> Envoy?: Mercredi 23 Juillet 2014 19:39:52
> Objet: Re: [ceph-users] MON segfaulting when setting a crush ruleset to a pool (firefly 0.80.4)
> 
> Hey Olivier,
> 
> On 07/23/2014 02:06 PM, Olivier DELHOMME wrote:
> > Hello,
> >
> > I'm running a test cluster (mon and osd are debian 7
> > with  3.2.57-3+deb7u2 kernel). The client is a debian 7
> > with a 3.15.4 kernel that I compiled myself.
> >
> > The cluster has 3 monitors and 16 osd servers.
> > I created a pool (periph) and used it a bit and then
> > I decided to create some buckets and moved the hosts
> > into :
> 
> Can you share your crush map?
> 
> Cheers!
> 
>    -Joao
> 
> 
> >
> > $ ceph osd crush add-bucket rack1-pdu1 rack
> > $ ceph osd crush add-bucket rack1-pdu2 rack
> > $ ceph osd crush add-bucket rack1-pdu3 rack
> > $ ceph osd crush add-bucket rack2-pdu1 rack
> > $ ceph osd crush add-bucket rack2-pdu2 rack
> > $ ceph osd crush add-bucket rack2-pdu3 rack
> > $ ceph osd crush move ceph-osd0 rack=rack1-pdu1
> > $ ceph osd crush move ceph-osd1 rack=rack1-pdu1
> > $ ceph osd crush move ceph-osd2 rack=rack1-pdu1
> > $ ceph osd crush move ceph-osd3 rack=rack1-pdu2
> > $ ceph osd crush move ceph-osd4 rack=rack1-pdu2
> > $ ceph osd crush move ceph-osd5 rack=rack1-pdu2
> > $ ceph osd crush move ceph-osd6 rack=rack1-pdu3
> > $ ceph osd crush move ceph-osd7 rack=rack1-pdu3
> > $ ceph osd crush move ceph-osd8 rack=rack1-pdu3
> > $ ceph osd crush move ceph-osd9 rack=rack2-pdu1
> > $ ceph osd crush move ceph-osd10 rack=rack2-pdu1
> > $ ceph osd crush move ceph-osd11 rack=rack2-pdu1
> > $ ceph osd crush move ceph-osd12 rack=rack2-pdu2
> > $ ceph osd crush move ceph-osd13 rack=rack2-pdu2
> > $ ceph osd crush move ceph-osd14 rack=rack2-pdu2
> > $ ceph osd crush move ceph-osd15 rack=rack2-pdu3
> >
> > It did well :
> >
> > $ ceph osd tree
> > # id    weight  type name       up/down reweight
> > -23     0.91    rack rack2-pdu3
> > -17     0.91            host ceph-osd15
> > 15      0.91                    osd.15  up      1
> > -22     1.81    rack rack2-pdu2
> > -14     0.45            host ceph-osd12
> > 12      0.45                    osd.12  up      1
> > -15     0.45            host ceph-osd13
> > 13      0.45                    osd.13  up      1
> > -16     0.91            host ceph-osd14
> > 14      0.91                    osd.14  up      1
> > -21     1.35    rack rack2-pdu1
> > -11     0.45            host ceph-osd9
> > 9       0.45                    osd.9   up      1
> > -12     0.45            host ceph-osd10
> > 10      0.45                    osd.10  up      1
> > -13     0.45            host ceph-osd11
> > 11      0.45                    osd.11  up      1
> > -20     1.35    rack rack1-pdu3
> > -8      0.45            host ceph-osd6
> > 6       0.45                    osd.6   up      1
> > -9      0.45            host ceph-osd7
> > 7       0.45                    osd.7   up      1
> > -10     0.45            host ceph-osd8
> > 8       0.45                    osd.8   up      1
> > -19     1.35    rack rack1-pdu2
> > -5      0.45            host ceph-osd3
> > 3       0.45                    osd.3   up      1
> > -6      0.45            host ceph-osd4
> > 4       0.45                    osd.4   up      1
> > -7      0.45            host ceph-osd5
> > 5       0.45                    osd.5   up      1
> > -18     1.35    rack rack1-pdu1
> > -2      0.45            host ceph-osd0
> > 0       0.45                    osd.0   up      1
> > -3      0.45            host ceph-osd1
> > 1       0.45                    osd.1   up      1
> > -4      0.45            host ceph-osd2
> > 2       0.45                    osd.2   up      1
> > -1      0       root default
> >
> >
> > But then, when trying to set the crush_ruleset to the
> > pool with the command below it crashes two of the three
> > monitor.
> >
> >
> > $ ceph osd pool set periph crush_ruleset 2
> > 2014-07-23 14:43:38.942811 7fa9696a3700  0 monclient: hunting for new mon
> >
> > The first monitor come to :
> >
> >      -4> 2014-07-23 14:43:37.476121 7f52d2f46700  1 -- 10.1.2.1:6789/0 -->
> >      10.1.2.100:0/1027991 -- mon_command_ack([{"prefix":
> >      "get_command_descriptions"}]=0  v0) v1 -- ?+29681 0x3b1c780 con
> >      0x2a578c0
> >      -3> 2014-07-23 14:43:37.598549 7f52d2f46700  1 -- 10.1.2.1:6789/0 <==
> >      client.39105 10.1.2.100:0/1027991 8 ==== mon_command({"var":
> >      "crush_ruleset", "prefix": "osd pool set", "pool": "periph", "val":
> >      "2"} v 0) v1 ==== 122+0+0 (2844980124 0 0) 0x3b1d860 con 0x2a578c0
> >      -2> 2014-07-23 14:43:37.598602 7f52d2f46700  0 mon.ceph-mon0 at 0(leader)
> >      e2 handle_command mon_command({"var": "crush_ruleset", "prefix": "osd
> >      pool set", "pool": "periph", "val": "2"} v 0) v1
> >      -1> 2014-07-23 14:43:37.598705 7f52d2f46700  1
> >      mon.ceph-mon0 at 0(leader).paxos(paxos active c 542663..543338)
> >      is_readable now=2014-07-23 14:43:37.598708 lease_expire=2014-07-23
> >      14:43:41.683421 has v0 lc 543338
> >       0> 2014-07-23 14:43:37.601706 7f52d2f46700 -1 *** Caught signal
> >       (Segmentation fault) **
> >   in thread 7f52d2f46700
> >
> >
> > Then, after an election try the second monitor goes down also :
> >
> >       -1> 2014-07-23 14:43:51.772370 7eff4ba15700  1
> >       mon.ceph-mon1 at 1(leader).paxos(paxos active c 542663..543338)
> >       is_readable now=2014-07-23 14:43:51.772373 lease_expire=2014-07-23
> >       14:43:56.770906 has v0 lc 543338
> >       0> 2014-07-23 14:43:51.775817 7eff4ba15700 -1 *** Caught signal
> >       (Segmentation fault) **
> >
> > I can not reactivate the monitors until the command
> > "ceph osd pool set periph crush_ruleset 2" is running.
> >
> > When I kill this command, then the monitors can run
> > again and retrieve a normal state but it leaves the
> > cluster with some warnings about data replacements
> > not achieved I guess (I had some data in the pool).
> >
> > cluster 4a8669b9-b379-43b2-9488-7fca6e1366bc
> >       health HEALTH_WARN 152 pgs peering; 166 pgs stuck inactive; 620 pgs
> >       stuck unclean; recovery 620/694410 objects degraded (0.089%)
> >       monmap e2: 3 mons at
> >       {ceph-mon0=10.1.2.1:6789/0,ceph-mon1=10.1.2.2:6789/0,ceph-mon2=10.1.2.3:6789/0},
> >       election epoch 50, quorum 0,1,2 ceph-mon0,ceph-mon1,ceph-mon2
> >       osdmap e1688: 16 osds: 16 up, 16 in
> >        pgmap v300875: 712 pgs, 5 pools, 1350 GB data, 338 kobjects
> >              2765 GB used, 5614 GB / 8379 GB avail
> >              620/694410 objects degraded (0.089%)
> >                    14 inactive
> >                   152 peering
> >                   454 active+remapped
> >                    92 active+clean
> >
> > Is there something that I did wrong or forgot to do ?
> >
> > While writing this mail down I realised that there is
> > only one host in the rack2-pdu3 rack. May this be a
> > cause of the problem ?
> >
> > Thanks for any hints.
> >
> 
> 
> --
> Joao Eduardo Luis
> Software Engineer | http://inktank.com | http://ceph.com
>