Re: Ceph remap/recovery stuck

Sage Weil <sage@xxxxxxxxxxx> · Fri, 24 Aug 2012 09:35:28 -0700 (PDT)

On Fri, 24 Aug 2012, S?awomir Skowron wrote:
> I have found workaround.
> 
> Change CRUSH to replication to osd in rule for this pool, and after
> recovery, remapped data, i just change same rule into rack awarenes,
> and whole cluster, recover again, and back to normal.
> 
> Is there any way, to start refill, recovery in this situation for this
> specyfic OSD ??

This sounds like it might be a problem with the crush retry behavior.  
In some cases it would fail to generate teh right number of replicas for a 
given input.  We fixed this by adding tunables that disable the old/bad 
behavior, but haven't enabled it by default because support is only now 
showing up in new kernels.  If you aren't using older kernel clients, you 
can enable the new values on your cluster by following the instructions 
at:

	http://ceph.com/docs/master/ops/manage/crush/#tunables

FWIW you can test whether this helps by extracting your crushmap from 
the cluster, making whatever changes you are planning to the map, and then 
running

 crushtool -i newmap --test

and verify that you get the right number of results for numrep=3 and 
below.  There are a bunch of options you can pass to adjust the range of 
inputs that are tested (e.g.,  --min-x 1 --max-x 100000, --num-rep 3, 
etc.).  crushtool is also used to adjust the tunables to 0, so you can 
then verify that it fixes the problem... all before injecting the new map 
into the cluster and actually triggering any data migration.

sage

> 
> On Thu, Aug 23, 2012 at 3:52 PM, S?awomir Skowron <szibis@xxxxxxxxx> wrote:
> > 3 osd after crash rebuilds ok, but rebuild of two more osd (12 and
> > 30), i can't make cluster to be active+clean
> >
> > I do rebuild like in doc:
> >
> > stop osd,
> > remove from crush,
> > rm from map,
> > recreate a osd, after cluster get stable
> >
> > But now, all osd are in, and up, and data won't remap, and some of PG,
> > have only two osd in chain with replication level 3 for this pool.
> >
> > 2012-08-23 15:26:46.073685 mon.0 [INF] pgmap v117192: 6472 pgs: 63
> > active, 4457 active+clean, 1942 active+remapped, 10 active+degraded;
> > 596 GB data, 1650 GB used, 20059 GB / 21710 GB avail; 57815/4705888
> > degraded (1.229%)
> >
> > In attachment output from:
> >
> > ceph osd dump -o -
> >
> > I can't find any info in doc for this situation.
> >
> > HEALTH_WARN 10 pgs degraded; 2015 pgs stuck unclean; recovery
> > 57871/4706179 degraded (1.230%)
> > root@s3-10-177-64-6:~# ceph -s
> >    health HEALTH_WARN 10 pgs degraded; 2015 pgs stuck unclean;
> > recovery 57871/4706179 degraded (1.230%)
> >    monmap e4: 3 mons at
> > {0=10.177.64.4:6789/0,1=10.177.64.6:6789/0,2=10.177.64.8:6789/0},
> > election epoch 16, quorum 0,1,2 0,1,2
> >    osdmap e1300: 78 osds: 78 up, 78 in
> >     pgmap v117464: 6472 pgs: 63 active, 4457 active+clean, 1942
> > active+remapped, 10 active+degraded; 596 GB data, 1651 GB used, 20059
> > GB / 21710 GB avail; 57871/4706179 degraded (1.230%)
> >    mdsmap e1: 0/0/1 up
> >
> > Please help, i will try to give you any output you need.
> >
> >
> > And one more thing, little bug in 0.48.1:
> >
> > ceph health blabla command, does same thing, as ceph health details.
> > Whatever is after health, means details.
> >
> > --
> > -----
> > Regards
> >
> > S?awek "sZiBis" Skowron
> 
> 
> 
> -- 
> -----
> Pozdrawiam
> 
> S?awek "sZiBis" Skowron
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html