Hello, On 20/03/2016 04:47, Christian Balzer wrote: > That's not protection, that's an "uh-oh, something is wrong, you better > check it out" notification, after which you get to spend a lot of time > figuring out which is the good replica In fact, I have never been confronted to this case so far and I have a couple of questions. 1. When it happens (ie a deep scrub fails), is it mentioned in the output of the "ceph status" command and, in this case, can you confirm to me that the health of the cluster in the output is different of "HEALTH_OK"? 2. For instance, if it happens with the PG id == 19.10 and if I have 3 OSDs for this PG (because my pool has replica size == 3). I suppose that the concerned OSDs are OSD id == 1, 6 and 12. Can you tell me if this "naive" method is valid to solve the problem (and, if not, why)? a) ssh in the node which hosts osd-1 and I launch this command: ~# id=1 && sha1sum /var/lib/ceph/osd/ceph-$id/current/19.10_head/* | sed "s|/ceph-$id/|/ceph-id/|" | sha1sum 055b0fd18cee4b158a8d336979de74d25fadc1a3 - b) ssh in the node which hosts osd-6 and I launch this command: ~# id=6 && sha1sum /var/lib/ceph/osd/ceph-$id/current/19.10_head/* | sed "s|/ceph-$id/|/ceph-id/|" | sha1sum 055b0fd18cee4b158a8d336979de74d25fadc1a3 - c) ssh in the node which hosts osd-12 and I launch this command: ~# id=12 && sha1sum /var/lib/ceph/osd/ceph-$id/current/19.10_head/* | sed "s|/ceph-$id/|/ceph-id/|" | sha1sum 3f786850e387550fdab836ed7e6dc881de23001b - I notice that the result is different for osd-12 so it's the "bad" osd. So, in the node which hosts osd-12, I launch this command: id=12 && rm /var/lib/ceph/osd/ceph-$id/current/19.10_head/* And now I can launch safely this command: ceph pg repair 19.10 Is there a problem with this "naive" method? -- François Lafont _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com