Can you post 'ceph osd dump --format=json-pretty'? I'm guessing that the replication level or crush rules are such that a single host with 6 osds can't satisfy it. sage On Tue, 27 Aug 2013, Johannes Klarenbeek wrote: > > Hi, > > > > It seems that all my pgs are stuck somewhat. I?m not sure what to do from > here. I waited a day in the hope that ceph would find a way to deal with > this? but nothing happened. > > I?m testing on a single ubuntu server 13.04 with dumpling 0.67.2. Below is > my ceph status. > > > > root@cephnode2:/root# ceph -s > > cluster 9087eb7a-abe1-4d38-99dc-cb6b266f0f84 > > health HEALTH_WARN 37 pgs degraded; 192 pgs stuck unclean > > monmap e1: 1 mons at {cephnode2=172.16.1.2:6789/0}, election epoch 1, > quorum 0 cephnode2 > > osdmap e38: 6 osds: 6 up, 6 in > > pgmap v65: 192 pgs: 155 active+remapped, 37 active+degraded; 0 bytes > data, 213 MB used, 11172 GB / 11172 GB avail > > mdsmap e1: 0/0/1 up > > > > root@cephnode2:/root# ceph osd tree > > # id weight type name up/down reweight > > -1 10.92 root default > > -2 10.92 host cephnode2 > > 0 1.82 osd.0 up 1 > > 1 1.82 osd.1 up 1 > > 2 1.82 osd.2 up 1 > > 3 1.82 osd.3 up 1 > > 4 1.82 osd.4 up 1 > > 5 1.82 osd.5 up 1 > > > > root@cephnode2:/root#ceph health detail > > HEALTH_WARN 37 pgs degraded; 192 pgs stuck unclean > > pg 0.3f is stuck unclean since forever, current state active+remapped, last > acting [2,0] > > pg 1.3e is stuck unclean since forever, current state active+remapped, last > acting [2,0] > > pg 2.3d is stuck unclean since forever, current state active+remapped, last > acting [2,0] > > pg 0.3e is stuck unclean since forever, current state active+remapped, last > acting [4,0] > > pg 1.3f is stuck unclean since forever, current state active+remapped, last > acting [1,0] > > pg 2.3c is stuck unclean since forever, current state active+remapped, last > acting [4,0] > > pg 0.3d is stuck unclean since forever, current state active+degraded, last > acting [0] > > pg 1.3c is stuck unclean since forever, current state active+degraded, last > acting [0] > > pg 2.3f is stuck unclean since forever, current state active+remapped, last > acting [4,1] > > pg 0.3c is stuck unclean since forever, current state active+remapped, last > acting [3,1] > > pg 1.3d is stuck unclean since forever, current state active+remapped, last > acting [4,0] > > pg 2.3e is stuck unclean since forever, current state active+remapped, last > acting [1,0] > > pg 0.3b is stuck unclean since forever, current state active+degraded, last > acting [0] > > pg 1.3a is stuck unclean since forever, current state active+degraded, last > acting [0] > > pg 2.39 is stuck unclean since forever, current state active+degraded, last > acting [0] > > pg 0.3a is stuck unclean since forever, current state active+remapped, last > acting [1,0] > > pg 1.3b is stuck unclean since forever, current state active+remapped, last > acting [3,1] > > pg 2.38 is stuck unclean since forever, current state active+remapped, last > acting [1,0] > > pg 0.39 is stuck unclean since forever, current state active+degraded, last > acting [0] > > pg 1.38 is stuck unclean since forever, current state active+degraded, last > acting [0] > > pg 2.3b is stuck unclean since forever, current state active+degraded, last > acting [0] > > pg 0.38 is stuck unclean since forever, current state active+remapped, last > acting [1,0] > > pg 1.39 is stuck unclean since forever, current state active+remapped, last > acting [1,0] > > pg 2.3a is stuck unclean since forever, current state active+remapped, last > acting [3,1] > > pg 0.37 is stuck unclean since forever, current state active+remapped, last > acting [3,2] > > [?] and many more. > > > > I found one entry on the mailing list from someone that had a similar issue > and he fixed it with the following commands: > > > > #ceph osd getcrushmap -o /tmp/crush > > #crushtool -i /tmp/crush --enable-unsafe-tunables > > --set-choose-local-tries 0 --set-choose-local-fallback-tries 0 > > --set-choose-total-tries 50 -o /tmp/crush.new > > root@ceph-admin:/etc/ceph# ceph osd setcrushmap -i /tmp/crush.new > > > > but I?m not sure what he is trying to do here. Especially > ?enable-unsafe-tunables seems a little ? unsafe. > > > > I also read thishttp://eu.ceph.com/docs/wip-3060/ops/manage/failures/osd/#failures-osd-unfo > und link. But it doesn?t detail about any actions that one can do in order > to fix it to a HEALTH_OK status. > > > > > > Regards, > > Johannes > > > > __________ Informatie van ESET Endpoint Antivirus, versie van database > viruskenmerken 8733 (20130827) __________ > > Het bericht is gecontroleerd door ESET Endpoint Antivirus. > > http://www.eset.com > > >
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com