Hi all! We are experiencing approximately 1 scrub error / inconsistent pg every two days. As far as I know, to fix this you can issue a "ceph pg repair", which works fine for us. I have a few qestions regarding the behavior of the ceph cluster in such a case: 1. After ceph detects the scrub error, the pg is marked as inconsistent. Does that mean that any IO to this pg is blocked until it is repaired? 2. Is this amount of scrub errors normal? We currently have only 150TB in our cluster, distributed over 720 2TB disks. 3. As far as I know, a "ceph pg repair" just copies the content of the primary pg to all replicas. Is this still the case? What if the primary copy is the one having errors? We have a 4x replication level and it would be cool if ceph would use one of the pg for recovery which has the same checksum as the majority of pgs. 4. Some of this errors are happening at night. Since ceph reports this as a critical error, our shift is called and wake up, just to issue a single command. Do you see any problems in triggering this command automatically via monitoring event? Is there a reason why ceph isn't resolving these errors itself when it has enought replicas to do so? Regards, Christian _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com