So we're running a 3-machine cluster with ceph 0.52 on ubuntu precise. Our cluster has 2 machines with 5 osds each, and a third machine with a rados gateway. Each machine has a mon. The default crushmap is putting a copy of the data on each machine, so 2 copies total. All of our reads and writes are done over the S3 gateway. We were curious about how it handled inconsistent file states so we uploaded a text file, then went into the osd data (/var/lib/ceph/osd/ceph-9/...) and changed that file on disk on one of the two osds. The cluster didn't automatically discover any errors, still reported HEALTH_OK, and S3 happily returned the broken copy of the file. We then did "ceph osd repair 9" (which is cheating since we knew which osd we'd broken it on). It discovered the error but didn't fix it, and now "ceph health detail" was returning "pg 9.7 is active+clean+inconsistent, acting [9,4]". Additional repair attempts didn't help. We then restarted all of the osds. The cluster was now reporting HEALTH_OK again, and kept reporting that even after we re-ran the repair command. The repair command still detected the inconsistency, though: 2012-09-28 21:46:58.068140 osd.9 [ERR] repair 9.7 994c51ff/4712.1_functions_admin.php/head//9 on disk size (90965) does not match object info size (91183) We then tried using S3 to download the broken file again. Every time we tried, it sent us the broken copy of the file, and then the rados gateway crashed as soon as the send was done. I restarted the gateway, and was able to reproduce this. Just curious to know more about the recovery behavior. How is ceph designed to recover from inconsistent states? Thanks! Sergey -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html