Hello, On Thu, 14 Aug 2014 03:38:11 +0000 David Moreau Simard wrote: > Hi, > > Trying to update my continuous integration environment.. same deployment > method with the following specs: > - Ubuntu Precise, Kernel 3.2, Emperor (0.72.2) - Yields a successful, > healthy cluster. > - Ubuntu Trusty, Kernel 3.13, Firefly (0.80.5) - I have stuck placement > groups. > > Here?s some relevant bits from the Trusty/Firefly setup before I move on > to what I?ve done/tried: http://pastebin.com/eqQTHcxU <? This was about > halfway through PG healing. > > So, the setup is three monitors, two other hosts on which there are 9 > OSDs each. At the beginning, all my placement groups were stuck unclean. > And there's your reason why the firefly install "failed". The default replication is 3 and you have just 2 storage nodes, combined with the default CRUSH rules that's exactly what will happen. To avoid this from the start either use 3 nodes or set --- osd_pool_default_size = 2 osd_pool_default_min_size = 1 --- in your ceph.conf very early on, before creating anything, especially OSDs. Setting the replication for all your pools to 2 with "ceph osd pool <name> set size 2" as the first step after your install should have worked, too. But with all the things you tried, I can't really tell you why things behaved they way they did for you. Christian > I tried the easy things first: > - set crush tunables to optimal > - run repairs/scrub on OSDs > - restart OSDs > > Nothing happened. All ~12000 PGs remained stuck unclean since forever > active+remapped. Next, I played with the crush map. I deleted the > default replicated_ruleset rule and created a (basic) rule for each pool > for the time being. I set the pools to use their respective rule and > also reduced their size to 2 and min_size to 1. > > Still nothing, all PGs stuck. > I?m not sure why but I tried setting the crush tunables to legacy - I > guess in a trial and error attempt. > > Half my PGs healed almost immediately. 6082 PGs remained in > active+remapped. I try running scrubs/repairs - it won?t heal the other > half. I set the tunables back to optimal, still nothing. > > I set tunables to legacy again and most of them end up healing with only > 1335 left in active+remapped. > > The remainder of the PGs healed when I restarted the OSDs. > > Does anyone have a clue why this happened ? > It looks like switching back and forth between tunables fixed the stuck > PGs ? > > I can easily reproduce this if anyone wants more info. > > Let me know ! > -- > David Moreau Simard > > _______________________________________________ > ceph-users mailing list > ceph-users at lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -- Christian Balzer Network/Systems Engineer chibi at gol.com Global OnLine Japan/Fusion Communications http://www.gol.com/