Re: Worst thing that can happen if I have size= 2

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 05/02/2021 20:10, Mario Giammarco wrote:

It is not that a morning I wake up and put some random  hardware together,
I followed guidelines.
The result should be:
- if a disk (or more) brokes work goes on
- if a server brokes the VMs on the server start on another server and
work goes on.

The result is: one disk brokes, ceph fills the other one in the same server
, reaches 90% and EVERYTHING stops including all VMs and the customer has
lost unsaved data and it cannot run the VMs it needs to continue works.
Not very "HA" as hoped.

With three OSD hosts, each with two disks, size=3 and default CRUSH rules (i.e. each replica goes to a different host) then each OSD host would expect to get roughly 1/3 of the total data. Under normal running this would mean each disk sees 1/6 of the total data.

When a single disk failed in your scenario above, all three hosts were still available and still get 1/3 of the total data. Because one disk failed, the surviving disk has to store the replicas that were on the failed disk as well its own (so, 2/6 total data - double what it had before). To have reached 90% full on the surviving disk suggests that it was (at least) 45% full under normal running.

Ceph is doing what it's supposed to in this case, the issue is that the disks haven't been sized large enough to allow for this failure.

Simon
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux