Over the course of the past year, I've had 3 instances where I had to manually repair an object due to size. In this case, I was immediately disappointed to discover what I think is evidence of only 1 of 3 replicas good. It got worse when a segfault occurred I attempted to flush the journal for one of the seemingly bad replicas.
Below is a segfault from ceph-osd -i 160 --flush-journal
More logs and command history can be found here:
So far, I've copied the object file to a tmp backup location, set noout, stopped the osd service for the associated osds for that pg, flushed the journals, and made a second copy of the objects post flush.
Any help would be greatly appreciated.
I'm considering just deleting the 2 known bad files and attempting a ceph pg repair. But, I'm not really sure that will work with only 1 good replica.
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com