Re: Bug in OSD Maps

David Turner <drakonstein@xxxxxxxxx> · Fri, 26 May 2017 13:50:52 +0000

Are those all currently running versions?  You should always run your cluster on the exact same version.

On Fri, May 26, 2017 at 6:05 AM Stuart Harland <s.harland@xxxxxxxxxxxxxxxxxxxxxx> wrote:
Could you elaborate about what constitutes deleting the PG in this instance, is a simple `rm` of the directories with the PG number in current sufficient? or does it need some poking of anything else?
It is conceivable that there is a fault with the disks, they are known to be ‘faulty’ in the general sense that they suffer a cliff-edge Perf issue, however I’m somewhat confused about why this would suddenly happen in the way it has been detected.

We are past early life failures, most of these disks don’t appear to have any significant issues in their smart data to indicate that any write failures are occurring, and I haven’t seen this error once until a couple of weeks ago (we’ve been operating this cluster over 2 years now).

The only versions I’m seeing running (just double checked) currently are 10.2.5,6 and 7. There was one node that had hammer running on it a while back, but it’s been running jewel for months now, so I doubt it’s related to that.

On 26 May 2017, at 00:22, Gregory Farnum <gfarnum@xxxxxxxxxx> wrote:

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com