Le 13/05/2019 à 16:20, Kevin Flöh a
écrit :
Dear ceph experts, With 3+1 you only allow a single OSD failure per pg at a given time. You have 4096 pgs and 96 osds, having 2 OSD fail at the same time on 2 separate servers (assuming standard crush rules) is a death sentence for the data on some pgs using both of those OSD (the ones not fully recovered before the second failure). Depending on the data stored (CephFS ?) you probably can recover most of it but some of it is irremediably lost. If you can recover the data from the failed OSD at the time they failed you might be able to recover some of your lost data (with the help of Ceph devs), if not there's nothing to do. In the later case I'd add a new server to use at least 3+2 for a fresh pool instead of 3+1 and begin moving the data to it. The 12.2 + 13.2 mix is a potential problem in addition to the one above but it's a different one. Best regards, Lionel |
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com