Hello, we have had some trouble of osds running full, even after rebalancing. So at 100% usage and ceph-osds not starting anymore, we decided to delete some pg directories, after which rebalancing finished. However after this, we have the situation that one pg is not becoming clean anymore. We tried to a) stop, stop+out osd.7 -> after rebalancing the pg is still stuck b) Mark objects lost: root@wein:~# ceph pg 3.14 mark_unfound_lost revert pg has no unfound objects c) stop osd.7, rsync the directory 3.14_head from osd.2, start osd.7 d) ceph pg scrub 3.14 So far the status is still that this pg is down. I have attached some of the lines / logs. I would be grateful if you can give any hints on how to repair this situation. Cheers, Nico p.s.: Using ceph 0.80.7. Action causing the problem: root@wein:/var/lib/ceph/osd/ceph-7/current# ls 0.12_head 0.a_head 2.1c_head 3.2a_head 3.4c_head 3.6b_TEMP 3.8b_head 3.97_TEMP 3.c7_TEMP 0.14_head 1.10_head 2.26_head 3.32_head 3.4c_TEMP 3.6c_head 3.8d_head 3.9b_head 3.c_head 0.21_head 1.1a_head 2.2a_head 3.32_TEMP 3.56_head 3.6c_TEMP 3.8d_TEMP 3.9b_TEMP 3.d_head 0.23_head 1.21_head 2.2e_head 3.37_head 3.56_TEMP 3.6_head 3.8e_head 3.a9_head 3.d_TEMP 0.2b_head 1.2b_head 2.2f_head 3.37_TEMP 3.5b_head 3.7b_head 3.8_head 3.a9_TEMP 3.f_head 0.2d_head 1.2c_head 2.33_head 3.47_head 3.5b_TEMP 3.7b_TEMP 3.91_head 3.ab_TEMP 3.f_TEMP 0.2e_head 1.32_head 2.3f_head 3.47_TEMP 3.60_head 3.80_head 3.91_TEMP 3.b2_TEMP commit_op_seq 0.2_head 1.37_head 2.b_head 3.49_head 3.61_head 3.81_head 3.93_head 3.b7_TEMP meta 0.38_head 1.3c_head 3.0_head 3.49_TEMP 3.61_TEMP 3.82_head 3.93_TEMP 3.bf_head nosnap 0.3b_head 1.e_head 3.12_head 3.4a_head 3.67_head 3.82_TEMP 3.94_head 3.bf_TEMP omap 0.3e_head 2.10_head 3.14_head 3.4a_TEMP 3.67_TEMP 3.89_head 3.94_TEMP 3.b_head 0.7_head 2.15_head 3.14_TEMP 3.4b_head 3.6b_head 3.89_TEMP 3.97_head 3.b_TEMP root@wein:/var/lib/ceph/osd/ceph-7/current# du -sh 3.14_* 3.9G 3.14_head 4.0K 3.14_TEMP The current status: root@kaffee:~# ceph -s cluster e0611730-09ff-4f3c-bfdb-2dd415274a36 health HEALTH_WARN 1 pgs down; 1 pgs peering; 1 pgs stuck inactive; 1 pgs stuck unclean; 5 requests are blocked > 32 sec monmap e3: 3 mons at {kaffee=192.168.40.1:6789/0,tee=192.168.40.2:6789/0,wein=192.168.40.3:6789/0}, election epoch 3652, quorum 0,1,2 kaffee,tee,wein osdmap e1129: 8 osds: 7 up, 7 in pgmap v435448: 448 pgs, 4 pools, 976 GB data, 248 kobjects 1938 GB used, 9913 GB / 11852 GB avail 447 active+clean 1 down+peering root@wein:/var/lib/ceph/osd/ceph-7/current# ceph health detail HEALTH_WARN 1 pgs incomplete; 1 pgs stuck inactive; 1 pgs stuck unclean; 5 requests are blocked > 32 sec; 1 osds have slow requests pg 3.14 is stuck inactive for 135697.438689, current state incomplete, last acting [2,7] pg 3.14 is stuck unclean for 135697.438702, current state incomplete, last acting [2,7] pg 3.14 is incomplete, acting [2,7] 5 ops are blocked > 8388.61 sec 5 ops are blocked > 8388.61 sec on osd.2 1 osds have slow requests root@wein:~# ceph pg dump_stuck stale ok root@wein:~# ceph pg dump_stuck unclean ok pg_stat objects mip degr unf bytes log disklog state state_stamp v reportedup up_primary acting acting_primary last_scrub scrub_stamp last_deep_scrub deep_scrub_stamp 3.14 1006 0 0 0 4135415824 3001 3001 incomplete 2014-12-19 14:40:00.272775 589'27399 1150:66317 [2,7] 2 [2,7] 2 503'24268 2014-12-13 19:17:39.272720 503'24268 2014-12-13 19:17:38.672258 root@wein:~# ceph pg dump_stuck inactive ok pg_stat objects mip degr unf bytes log disklog state state_stamp v reportedup up_primary acting acting_primary last_scrub scrub_stamp last_deep_scrub deep_scrub_stamp 3.14 1006 0 0 0 4135415824 3001 3001 incomplete 2014-12-19 14:40:00.272775 589'27399 1150:66317 [2,7] 2 [2,7] 2 503'24268 2014-12-13 19:17:39.272720 503'24268 2014-12-13 19:17:38.672258 root@wein:~# root@wein:~# ceph osd tree # id weight type name up/down reweight -1 2.3 root default -2 0.2999 host wein 0 0.04999 osd.0 up 1 3 0.04999 osd.3 up 1 4 0.04999 osd.4 up 1 5 0.04999 osd.5 up 1 6 0.04999 osd.6 up 1 7 0.04999 osd.7 up 1 -3 1 host tee 1 5.5 osd.1 up 1 -4 1 host kaffee 2 5.5 osd.2 up 1 root@wein:~# Fixes we tried: root@wein:~# ceph pg 3.14 mark_unfound_lost revert pg has no unfound objects root@kaffee:~# rsync -av /var/lib/ceph/osd/ceph-2/current/3.14_head/ root@xxxxxxxxxxxxxxxx:/var/lib/ceph/osd/ceph-7/current/3.14_head/ + stop & restart osd.7 around it root@wein:~# ceph pg deep-scrub 3.14 instructing pg 3.14 on osd.2 to deep-scrub -- New PGP key: 659B 0D91 E86E 7E24 FD15 69D0 C729 21A1 293F 2D24 _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com