Re: Some help needed with ceph deployment

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

 

It seems that all my pgs are stuck somewhat. I’m not sure what to do from here. I waited a day in the hope that ceph would find a way to deal with this… but nothing happened.

I’m testing on a single ubuntu server 13.04 with dumpling 0.67.2. Below is my ceph status.

 

root@cephnode2:/root# ceph -s

  cluster 9087eb7a-abe1-4d38-99dc-cb6b266f0f84

   health HEALTH_WARN 37 pgs degraded; 192 pgs stuck unclean

   monmap e1: 1 mons at {cephnode2=172.16.1.2:6789/0}, election epoch 1, quorum 0 cephnode2

   osdmap e38: 6 osds: 6 up, 6 in

    pgmap v65: 192 pgs: 155 active+remapped, 37 active+degraded; 0 bytes data, 213 MB used, 11172 GB / 11172 GB avail

   mdsmap e1: 0/0/1 up

 

root@cephnode2:/root# ceph osd tree

# id    weight  type name       up/down reweight

-1      10.92   root default

-2      10.92           host cephnode2

0       1.82                    osd.0   up      1

1       1.82                    osd.1   up      1

2       1.82                    osd.2   up      1

3       1.82                    osd.3   up      1

4       1.82                    osd.4   up      1

5       1.82                    osd.5   up      1

 

root@cephnode2:/root#ceph health detail

HEALTH_WARN 37 pgs degraded; 192 pgs stuck unclean

pg 0.3f is stuck unclean since forever, current state active+remapped, last acting [2,0]

pg 1.3e is stuck unclean since forever, current state active+remapped, last acting [2,0]

pg 2.3d is stuck unclean since forever, current state active+remapped, last acting [2,0]

pg 0.3e is stuck unclean since forever, current state active+remapped, last acting [4,0]

pg 1.3f is stuck unclean since forever, current state active+remapped, last acting [1,0]

pg 2.3c is stuck unclean since forever, current state active+remapped, last acting [4,0]

pg 0.3d is stuck unclean since forever, current state active+degraded, last acting [0]

pg 1.3c is stuck unclean since forever, current state active+degraded, last acting [0]

pg 2.3f is stuck unclean since forever, current state active+remapped, last acting [4,1]

pg 0.3c is stuck unclean since forever, current state active+remapped, last acting [3,1]

pg 1.3d is stuck unclean since forever, current state active+remapped, last acting [4,0]

pg 2.3e is stuck unclean since forever, current state active+remapped, last acting [1,0]

pg 0.3b is stuck unclean since forever, current state active+degraded, last acting [0]

pg 1.3a is stuck unclean since forever, current state active+degraded, last acting [0]

pg 2.39 is stuck unclean since forever, current state active+degraded, last acting [0]

pg 0.3a is stuck unclean since forever, current state active+remapped, last acting [1,0]

pg 1.3b is stuck unclean since forever, current state active+remapped, last acting [3,1]

pg 2.38 is stuck unclean since forever, current state active+remapped, last acting [1,0]

pg 0.39 is stuck unclean since forever, current state active+degraded, last acting [0]

pg 1.38 is stuck unclean since forever, current state active+degraded, last acting [0]

pg 2.3b is stuck unclean since forever, current state active+degraded, last acting [0]

pg 0.38 is stuck unclean since forever, current state active+remapped, last acting [1,0]

pg 1.39 is stuck unclean since forever, current state active+remapped, last acting [1,0]

pg 2.3a is stuck unclean since forever, current state active+remapped, last acting [3,1]

pg 0.37 is stuck unclean since forever, current state active+remapped, last acting [3,2]

[…] and many more.

 

I found one entry on the mailing list from someone that had a similar issue and he fixed it with the following commands:

 

#ceph osd getcrushmap -o /tmp/crush

#crushtool -i /tmp/crush --enable-unsafe-tunables

--set-choose-local-tries 0 --set-choose-local-fallback-tries 0

--set-choose-total-tries 50 -o /tmp/crush.new

root@ceph-admin:/etc/ceph# ceph osd setcrushmap -i /tmp/crush.new

 

but I’m not sure what he is trying to do here. Especially –enable-unsafe-tunables seems a little … unsafe.

 

I also read this http://eu.ceph.com/docs/wip-3060/ops/manage/failures/osd/#failures-osd-unfound link. But it doesn’t detail about any actions that one can do in order to fix it to a HEALTH_OK status.

 

 

Regards,

Johannes



__________ Informatie van ESET Endpoint Antivirus, versie van database viruskenmerken 8733 (20130827) __________

Het bericht is gecontroleerd door ESET Endpoint Antivirus.

http://www.eset.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux