On Wed, Feb 8, 2017 at 8:17 AM, <george.vasilakakos@xxxxxxxxxx> wrote: > Hi Ceph folks, > > I have a cluster running Jewel 10.2.5 using a mix EC and replicated pools. > > After rebooting a host last night, one PG refuses to complete peering > > pg 1.323 is stuck inactive for 73352.498493, current state peering, last acting [595,1391,240,127,937,362,267,320,7,634,716] > > Restarting OSDs or hosts does nothing to help, or sometimes results in things like this: > > pg 1.323 is remapped+peering, acting [2147483647,1391,240,127,937,362,267,320,7,634,716] > > > The host that was rebooted is home to osd.7 (8). If I go onto it to look at the logs for osd.7 this is what I see: > > $ tail -f /var/log/ceph/ceph-osd.7.log > 2017-02-08 15:41:00.445247 7f5fcc2bd700 0 -- XXX.XXX.XXX.172:6905/20510 >> XXX.XXX.XXX.192:6921/55371 pipe(0x7f6074a0b400 sd=34 :42828 s=2 pgs=319 cs=471 l=0 c=0x7f6070086700).fault, initiating reconnect > > I'm assuming that in IP1:port1/PID1 >> IP2:port2/PID2 the >> indicates the direction of communication. I've traced these to osd.7 (rank 8 in the stuck PG) reaching out to osd.595 (the primary in the stuck PG). > > Meanwhile, looking at the logs of osd.595 I see this: > > $ tail -f /var/log/ceph/ceph-osd.595.log > 2017-02-08 15:41:15.760708 7f1765673700 0 -- XXX.XXX.XXX.192:6921/55371 >> XXX.XXX.XXX.172:6905/20510 pipe(0x7f17b2911400 sd=101 :6921 s=0 pgs=0 cs=0 l=0 c=0x7f17b7beaf00).accept connect_seq 478 vs existing 477 state standby > 2017-02-08 15:41:20.768844 7f1765673700 0 bad crc in front 1941070384 != exp 3786596716 > > which again shows osd.595 reaching out to osd.7 and from what I could gather the CRC problem is about messaging. Yes, "bad crc" indicates that the checksums on an incoming message did not match what was provided — ie, the message got corrupted. You shouldn't try and fix that by playing around with the peering settings as it's not a peering bug. Unless there's a bug in the messaging layer causing this (very unlikely), you have bad hardware or a bad network configuration (people occasionally talk about MTU settings?). Fix that and things will work; don't and the only software tweaks you could apply are more likely to result in lost data than a happy cluster. -Greg > > Google searching has yielded nothing particularly useful on how to get this unstuck. > > ceph pg 1.323 query seems to hang forever but it completed once last night and I noticed this: > > "peering_blocked_by_detail": [ > { > "detail": "peering_blocked_by_history_les_bound" > } > > We have seen this before and it was cleared by setting osd_find_best_info_ignore_history_les to true for the first two OSDs on the stuck PGs (this was on a 3 replica pool). This hasn't worked in this case and I suspect the option needs to be set on either a majority of OSDs or enough k number of OSDs to be able to use their data and ignore history. > > We would really appreciate any guidance and/or help the community can offer! > > > George V. > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com