I'm testing octopus 15.2.16 and run into a problem right away. I'm filling up a small test cluster with 3 hosts 3x3 OSDs and killed one OSD to see how recovery works. I have one 4+2 EC pool with failure domain host and on 1 PGs of this pool 2 (!!!) shards are missing. This most degraded PG is not becoming active, its stuck inactive but peered. Questions: - How can 2 OSDs be missing if only 1 OSD is down? - Wasn't there an important code change to allow recovery for an EC PG with at least k shards present even if min_size>k? Do I have to set something? - If the PG should recover, why is it not prioritised considering its severe degradation compared with all other PGs? I have already increased these crush tunables and executed a pg repeer to no avail: tunable choose_total_tries 250 <-- default 100 rule fs-data { id 1 type erasure min_size 3 max_size 6 step set_chooseleaf_tries 50 <-- default 5 step set_choose_tries 200 <-- default 100 step take default step choose indep 0 type osd step emit } Ceph health detail says to that: [WRN] PG_AVAILABILITY: Reduced data availability: 1 pg inactive pg 4.32 is stuck inactive for 37m, current state recovery_wait+undersized+degraded+remapped+peered, last acting [1,2147483647,2147483647,4,5,2] I don't want to cheat and set min_size=k on this pool. It should work by itself. Thanks for any pointers! ================= Frank Schilder AIT Risø Campus Bygning 109, rum S14 _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx