Thanks for response. All OSDs seems to be ok, they have been restarted, joined cluster after that, nothing weird in the logs. # ceph pg dump_stuck stale ok # ceph pg dump_stuck inactive ok pg_stat state up up_primary acting acting_primary 3.2929 incomplete [109,272,83] 109 [109,272,83] 109 3.1683 incomplete [166,329,281] 166 [166,329,281] 166 # ceph pg dump_stuck unclean ok pg_stat state up up_primary acting acting_primary 3.2929 incomplete [109,272,83] 109 [109,272,83] 109 3.1683 incomplete [166,329,281] 166 [166,329,281] 166 On OSD 166 there is 100 blocked ops (on 109 too), they all end on "event": "reached_pg" # ceph --admin-daemon /var/run/ceph/ceph-osd.166.asok dump_ops_in_flight ... { "description": "osd_op(client.958764031.0:18137113 rbd_data.392585982ae8944a.0000000000000ad4 [set-alloc-hint object_size 4194304 write_size 4194304,write 2641920~8192] 3.d6195683 RETRY=15 ack+ondisk+retry+write+known_if_redirected e613241)", "initiated_at": "2016-06-21 10:19:59.894393", "age": 828.025527, "duration": 600.020809, "type_data": [ "reached pg", { "client": "client.958764031", "tid": 18137113 }, [ { "time": "2016-06-21 10:19:59.894393", "event": "initiated" }, { "time": "2016-06-21 10:29:59.915202", "event": "reached_pg" } ] ] } ], "num_ops": 100 } On 06/21/2016 12:27 PM, M Ranga Swami Reddy wrote: > you can use the below cmds: > == > > ceph pg dump_stuck stale > ceph pg dump_stuck inactive > ceph pg dump_stuck unclean > === > > And the query the PG, which are in unclean or stale state, check for > any issue with a specific OSD. > > Thanks > Swami > > On Tue, Jun 21, 2016 at 3:02 PM, Paweł Sadowski <ceph@xxxxxxxxx> wrote: >> Hello, >> >> We have an issue on one of our clusters. One node with 9 OSD was down >> for more than 12 hours. During that time cluster recovered without >> problems. When host back to the cluster we got two PGs in incomplete >> state. We decided to mark OSDs on this host as out but the two PGs are >> still in incomplete state. Trying to query those pg hangs forever. We >> were alredy trying restarting OSDs. Is there any way to solve this issue >> without loosing data? Any help appreciate :) >> >> # ceph health detail | grep incomplete >> HEALTH_WARN 2 pgs incomplete; 2 pgs stuck inactive; 2 pgs stuck unclean; >> 200 requests are blocked > 32 sec; 2 osds have slow requests; >> noscrub,nodeep-scrub flag(s) set >> pg 3.2929 is stuck inactive since forever, current state incomplete, >> last acting [109,272,83] >> pg 3.1683 is stuck inactive since forever, current state incomplete, >> last acting [166,329,281] >> pg 3.2929 is stuck unclean since forever, current state incomplete, last >> acting [109,272,83] >> pg 3.1683 is stuck unclean since forever, current state incomplete, last >> acting [166,329,281] >> pg 3.1683 is incomplete, acting [166,329,281] (reducing pool vms >> min_size from 2 may help; search ceph.com/docs for 'incomplete') >> pg 3.2929 is incomplete, acting [109,272,83] (reducing pool vms min_size >> from 2 may help; search ceph.com/docs for 'incomplete') >> >> Directory for PG 3.1683 is present on OSD 166 and containes ~8GB. >> >> We didn't try setting min_size to 1 yet (we treat is as a last resort). >> >> >> >> Some cluster info: >> # ceph --version >> >> ceph version 0.94.6 (e832001feaf8c176593e0325c8298e3f16dfb403) >> >> # ceph -s >> health HEALTH_WARN >> 2 pgs incomplete >> 2 pgs stuck inactive >> 2 pgs stuck unclean >> 200 requests are blocked > 32 sec >> noscrub,nodeep-scrub flag(s) set >> monmap e7: 5 mons at >> {mon-03=*.2:6789/0,mon-04=*.36:6789/0,mon-05=*.81:6789/0,mon-06=*.0:6789/0,mon-07=*.40:6789/0} >> election epoch 3250, quorum 0,1,2,3,4 >> mon-06,mon-07,mon-04,mon-03,mon-05 >> osdmap e613040: 346 osds: 346 up, 337 in >> flags noscrub,nodeep-scrub >> pgmap v27163053: 18624 pgs, 6 pools, 138 TB data, 39062 kobjects >> 415 TB used, 186 TB / 601 TB avail >> 18622 active+clean >> 2 incomplete >> client io 9992 kB/s rd, 64867 kB/s wr, 8458 op/s >> >> >> # ceph osd pool get vms pg_num >> pg_num: 16384 >> >> # ceph osd pool get vms size >> size: 3 >> >> # ceph osd pool get vms min_size >> min_size: 2 -- PS _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com