Incomplete usually means the pgs do not have any complete copies. Did you previously have more osds? -Sam On Tue, Nov 4, 2014 at 7:37 AM, Chad Seys <cwseys@xxxxxxxxxxxxxxxx> wrote: > On Monday, November 03, 2014 17:34:06 you wrote: >> If you have osds that are close to full, you may be hitting 9626. I >> pushed a branch based on v0.80.7 with the fix, wip-v0.80.7-9626. >> -Sam > > Thanks Sam I may have been hitting that as well. I certainly hit too_full > conditions often. I am able to squeeze PGs off of the too_full OSD by > reweighting and then eventually all PGs get to where they want to be. Kind of > silly that I have to do this manually though. Could Ceph order the PG > movements better? (Is this what your bug fix does in effect?) > > > So, at the moment there are no PG moving around the cluster, but all are not > in active+clean. Also, there is one OSD which has blocked requests. The OSD > seems idle and restarting the OSD just results in a younger blocked request. > > ~# ceph -s > cluster 7797e50e-f4b3-42f6-8454-2e2b19fa41d6 > health HEALTH_WARN 35 pgs down; 208 pgs incomplete; 210 pgs stuck > inactive; 210 pgs stuck unclean; 1 requests are blocked > 32 sec > monmap e3: 3 mons at > {mon01=128.104.164.197:6789/0,mon02=128.104.164.198:6789/0,mon03=144.92.180.139:67 > 89/0}, election epoch 2996, quorum 0,1,2 mon01,mon02,mon03 > osdmap e115306: 24 osds: 24 up, 24 in > pgmap v6630195: 8704 pgs, 7 pools, 6344 GB data, 1587 kobjects > 12747 GB used, 7848 GB / 20596 GB avail > 2 inactive > 8494 active+clean > 173 incomplete > 35 down+incomplete > > # ceph health detail > ... > 1 ops are blocked > 8388.61 sec > 1 ops are blocked > 8388.61 sec on osd.15 > 1 osds have slow requests > > from the log of the osd with the blocked request (osd.15): > 2014-11-04 08:57:26.851583 7f7686331700 0 log [WRN] : 1 slow requests, 1 > included below; oldest blocked for > 3840.430247 secs > 2014-11-04 08:57:26.851593 7f7686331700 0 log [WRN] : slow request > 3840.430247 seconds old, received at 2014-11-04 07:53:26.421301: > osd_op(client.11334078.1:592 rb.0.206609.238e1f29.0000000752e8 [read 512~512] > 4.17df39a7 RETRY=1 retry+read e115304) v4 currently reached pg > > > Other requests (like PG scrubs) are happening without taking a long time on > this OSD. > Also, this was one of the OSDs which I completely drained, removed from ceph, > reformatted, and created again using ceph-deploy. So it is completely created > by firefly 0.80.7 code. > > > As Greg requested, output of ceph scrub: > > 2014-11-04 09:25:58.761602 7f6c0e20b700 0 mon.mon01@0(leader) e3 > handle_command mon_command({"prefix": "scrub"} v 0) v1 > 2014-11-04 09:26:21.320043 7f6c0ea0c700 1 mon.mon01@0(leader).paxos(paxos > updating c 11563072..11563575) accept timeout, calling fresh elect > ion > 2014-11-04 09:26:31.264873 7f6c0ea0c700 0 > mon.mon01@0(probing).data_health(2996) update_stats avail 38% total 6948572 > used 3891232 avail 268 > 1328 > 2014-11-04 09:26:33.529403 7f6c0e20b700 0 log [INF] : mon.mon01 calling new > monitor election > 2014-11-04 09:26:33.538286 7f6c0e20b700 1 mon.mon01@0(electing).elector(2996) > init, last seen epoch 2996 > 2014-11-04 09:26:38.809212 7f6c0ea0c700 0 log [INF] : mon.mon01@0 won leader > election with quorum 0,2 > 2014-11-04 09:26:40.215095 7f6c0e20b700 0 log [INF] : monmap e3: 3 mons at > {mon01=128.104.164.197:6789/0,mon02=128.104.164.198:6789/0,mon03= > 144.92.180.139:6789/0} > 2014-11-04 09:26:40.215754 7f6c0e20b700 0 log [INF] : pgmap v6630201: 8704 > pgs: 2 inactive, 8494 active+clean, 173 incomplete, 35 down+incom > plete; 6344 GB data, 12747 GB used, 7848 GB / 20596 GB avail > 2014-11-04 09:26:40.215913 7f6c0e20b700 0 log [INF] : mdsmap e1: 0/0/1 up > 2014-11-04 09:26:40.216621 7f6c0e20b700 0 log [INF] : osdmap e115306: 24 > osds: 24 up, 24 in > 2014-11-04 09:26:41.227010 7f6c0e20b700 0 log [INF] : pgmap v6630202: 8704 > pgs: 2 inactive, 8494 active+clean, 173 incomplete, 35 down+incom > plete; 6344 GB data, 12747 GB used, 7848 GB / 20596 GB avail > 2014-11-04 09:26:41.367373 7f6c0e20b700 1 mon.mon01@0(leader).osd e115307 > e115307: 24 osds: 24 up, 24 in > 2014-11-04 09:26:41.437706 7f6c0e20b700 0 log [INF] : osdmap e115307: 24 > osds: 24 up, 24 in > 2014-11-04 09:26:41.471558 7f6c0e20b700 0 log [INF] : pgmap v6630203: 8704 > pgs: 2 inactive, 8494 active+clean, 173 incomplete, 35 down+incom > plete; 6344 GB data, 12747 GB used, 7848 GB / 20596 GB avail > 2014-11-04 09:26:41.497318 7f6c0e20b700 1 mon.mon01@0(leader).osd e115308 > e115308: 24 osds: 24 up, 24 in > 2014-11-04 09:26:41.533965 7f6c0e20b700 0 log [INF] : osdmap e115308: 24 > osds: 24 up, 24 in > 2014-11-04 09:26:41.553161 7f6c0e20b700 0 log [INF] : pgmap v6630204: 8704 > pgs: 2 inactive, 8494 active+clean, 173 incomplete, 35 down+incom > plete; 6344 GB data, 12747 GB used, 7848 GB / 20596 GB avail > 2014-11-04 09:26:42.701720 7f6c0e20b700 1 mon.mon01@0(leader).osd e115309 > e115309: 24 osds: 24 up, 24 in > 2014-11-04 09:26:42.953977 7f6c0e20b700 0 log [INF] : osdmap e115309: 24 > osds: 24 up, 24 in > 2014-11-04 09:26:45.776411 7f6c0e20b700 0 log [INF] : pgmap v6630205: 8704 > pgs: 2 inactive, 8494 active+clean, 173 incomplete, 35 down+incom > plete; 6344 GB data, 12747 GB used, 7848 GB / 20596 GB avail > 2014-11-04 09:26:46.767534 7f6c0e20b700 1 mon.mon01@0(leader).osd e115310 > e115310: 24 osds: 24 up, 24 in > 2014-11-04 09:26:46.817764 7f6c0e20b700 0 log [INF] : osdmap e115310: 24 > osds: 24 up, 24 in > 2014-11-04 09:26:47.593483 7f6c0e20b700 0 log [INF] : pgmap v6630206: 8704 > pgs: 2 inactive, 8489 active+clean, 1 peering, 173 incomplete, 4 > remapped, 35 down+incomplete; 6344 GB data, 12747 GB used, 7848 GB / 20596 GB > avail > 2014-11-04 09:26:48.170586 7f6c0e20b700 0 log [INF] : pgmap v6630207: 8704 > pgs: 2 inactive, 8489 active+clean, 1 peering, 173 incomplete, 4 > remapped, 35 down+incomplete; 6344 GB data, 12747 GB used, 7848 GB / 20596 GB > avail > 2014-11-04 09:26:48.381781 7f6c0e20b700 1 mon.mon01@0(leader).osd e115311 > e115311: 24 osds: 24 up, 24 in > 2014-11-04 09:26:48.484570 7f6c0e20b700 0 log [INF] : osdmap e115311: 24 > osds: 24 up, 24 in > 2014-11-04 09:26:48.857188 7f6c0e20b700 1 mon.mon01@0(leader).log v4718896 > check_sub sending message to client.11353722 128.104.164.197:0/10 > 07270 with 1 entries (version 4718896) > 2014-11-04 09:26:50.565461 7f6c0e20b700 0 log [INF] : pgmap v6630208: 8704 > pgs: 8491 active+clean, 1 peering, 173 incomplete, 4 remapped, 35 > down+incomplete; 6344 GB data, 12747 GB used, 7848 GB / 20596 GB avail > 2014-11-04 09:26:51.432688 7f6c0e20b700 1 mon.mon01@0(leader).log v4718897 > check_sub sending message to client.11353722 128.104.164.197:0/1007270 with 3 > entries (version 4718897) > 2014-11-04 09:26:51.476778 7f6c0e20b700 1 mon.mon01@0(leader).osd e115312 > e115312: 24 osds: 24 up, 24 in > [... not sure how much to include ...] > > > Looks like that cleared up two inactive PGs... > > # ceph -s > cluster 7797e50e-f4b3-42f6-8454-2e2b19fa41d6 > health HEALTH_WARN 35 pgs down; 208 pgs incomplete; 208 pgs stuck > inactive; 208 pgs stuck unclean; 1 requests are blocked > 32 sec > monmap e3: 3 mons at > {mon01=128.104.164.197:6789/0,mon02=128.104.164.198:6789/0,mon03=144.92.180.139:6789/0}, > election epoch 3000, quorum 0,1,2 mon01,mon02,mon03 > osdmap e115315: 24 osds: 24 up, 24 in > pgmap v6630222: 8704 pgs, 7 pools, 6344 GB data, 1587 kobjects > 12747 GB used, 7848 GB / 20596 GB avail > 8496 active+clean > 173 incomplete > 35 down+incomplete > > Thanks for your help, > Chad. > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com