Hello, I'm using ceph-jewel 10.2.7 for some test. Discovered that when an osd is full(like full_ratio=0.95), client write failed, which is normal. But a full osd cannot stop a recovering cluster writing data, make osd used ratio from 95% to100%. When that happen, osd will be down for no space left and cannot startup anymore. So the question is : can the cluster auto stop recovering while osd is reaching full without setting the norecover flag manually? Or is it already fix in the latest version? Consider this situation: a half-full cluster with many osds. For some bad luck(netlink down| server down | or others) in midnight, some osds down|out and trigger cluster recovery, makes some other health osds' used% to 100% (experienceless in operation and maintenance, please fix me if i'm wrong). Unluckly, this just like a plague and make much more osds down. It maybe easy to fix one down osd like that, but a disaster to fix 10+ osds with 100% space used. here is my test environment and steps: three nodes, each node has one monitor and one osd(10G hdd for convenient), running in vm. ceph conf is basic. pool size set to 2. using 'rados bench' writing data to osds. 1. exec command to set osd full ratio: # ceph pg set_full_ratio 0.8 # ceph pg set nearfull_ratio 0.7 2. writing data, when an osd is reaching full, stop writing and mark out one osd with command: # ceph osd out 0 3. waiting for cluster recovering finished , and exec command: # ceph osd df # ceph osd tree we can find that other osds is down. Thanks and Best Regards! He Handong -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html