Could you elaborate about what constitutes deleting the PG in this instance, is a simple `rm` of the directories with the PG number in current sufficient? or does it need some poking of anything else? It is conceivable that there is a fault with the disks, they are known to be ‘faulty’ in the general sense that they suffer a cliff-edge Perf issue, however I’m somewhat confused about why this would suddenly happen in the way it has been detected. We are past early life failures, most of these disks don’t appear to have any significant issues in their smart data to indicate that any write failures are occurring, and I haven’t seen this error once until a couple of weeks ago (we’ve been operating this cluster over 2 years now). The only versions I’m seeing running (just double checked) currently are 10.2.5,6 and 7. There was one node that had hammer running on it a while back, but it’s been running jewel for months now, so I doubt it’s related to that. |
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com