Never mind, I should’ve read the whole thread first.
All it takes for data loss is that an osd on server 1 is marked down and a write happens to an osd on server 2. Now the osd on server 2 goes down before the osd on server 1 has finished backfilling and the first osd receives a request to modify data in the object that it doesn't know the current state of. Tada, you have data loss.
I’m probably misunderstanding, but if a osd on server 1 is backfilling, and its only candidate to backfill from is an osd on server 2, and the latter goes down; then wouldn’t the osd on server 1 block, i.e., not accept requests to modify, until server 1 comes up again? Or is there a ‘hole' here somewhere where server 1 *thinks* it’s done backfilling whereas the osdmap it used to backfill with was out of date?
Thanks,
Hans
|
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com