Exactly, we minimize the blast radius/data destruction by allocating more devices for DB/WAL of smaller size than less of larger size. We encountered this same issue on an earlier iteration of our hardware design. With rotational drives and NVMEs, we are now aiming for a 6:1 ratio based on our CRUSH rules/rotational disk sizing/nvme sizing/server sizing/EC setup/etc. Make sure to use write-friendly NVMEs for DB/WAL and the failures should be much fewer and further between. On Thu, Sep 9, 2021 at 9:11 AM Janne Johansson <icepic.dz@xxxxxxxxx> wrote: > > Den tors 9 sep. 2021 kl 16:09 skrev Michal Strnad <michal.strnad@xxxxxxxxx>: > > When the disk with DB died > > it will cause inaccessibility of all depended OSDs (six or eight in our > > environment), > > How do you do it in your environment? > > Have two ssds for 8 OSDs, so only half go away when one ssd dies. > > -- > May the most significant bit of your life be positive. > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx