Den mån 22 nov. 2021 kl 11:40 skrev Marius Leustean <marius.leus@xxxxxxxxx>: > > I do not know what you mean by this, you can tune this with your min size > and replication. It is hard to believe that exactly harddrives fail in the > same pg. I wonder if this is not more related to your 'non-default' config? > > In my setup size=2 and min_size=1. I had cases when 1 PG being stuck in > peering state was causing all the VMs in that pool to not get any I/O. My > setup is really "default", deployed with minimal config changes derived > from ceph-ansible and with even number of OSDs per host. nono, default is repl=3, min_size=2 for the very reason that you need to be able to continue when one OSD is down. You set yourself into this position by reducing the safety and ceph reacted by stopping the writes rather than allowing you to lose data. If you were afraid of losing access, you should have tuned it in the other direction instead, repl=4,5 and min_size 2,3 at that, so you could lose two drives and still recover/continue. -- May the most significant bit of your life be positive. _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx