Re: Worst thing that can happen if I have size= 2

Frank Schilder <frans@xxxxxx> · Thu, 4 Feb 2021 10:56:51 +0000

> - three servers
> - three monitors
> - 6 osd (two per server)
> - size=3 and min_size=2

This is a set-up that I would not run at all. The first one is, that ceph lives on the law of large numbers and 6 is a small number. Hence, your OSD fill-up due to uneven distribution.

What comes to my mind is a hyper-converged server with 6+ disks in a RAID10 array, possibly with a good controller with battery-powered or other non-volatile cache. Ceph will never beat that performance. Put in some extra disks as hot-spare and you have close to self-healing storage.

Such a small ceph cluster will inherit all the baddies of ceph (performance, maintenance) without giving any of the goodies (scale-out, self-healing, proper distributed raid protection). Ceph needs size to become well-performing and pay off the maintenance and architectural effort.

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Mario Giammarco <mgiammarco@xxxxxxxxx>
Sent: 04 February 2021 11:29:49
To: Dan van der Ster
Cc: Ceph Users
Subject:  Re: Worst thing that can happen if I have size= 2

Il giorno mer 3 feb 2021 alle ore 21:22 Dan van der Ster <dan@xxxxxxxxxxxxxx>
ha scritto:

>
> Lastly, if you can't afford 3x replicas, then use 2+2 erasure coding if
> possible.
>
>
I will investigate I heard that erasure coding is slow.

Anyway I will write here the reason of this thread:
In my customers I have usually proxmox+ceph with:

- three servers
- three monitors
- 6 osd (two per server)
- size=3 and min_size=2

I followed the recommendations to stay safe.
But one day one disk of one server has broken, osd where at 55%.
What happened then?
Ceph started filling the remaining OSD to maintain size=3
OSD reached 90% ceph stopped all.
Customer VMs froze and customer lost time and some data that was not
written on disk.

So I got angry.... size=3 and customer still loses time and data?

> Cheers, Dan
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> On Wed, Feb 3, 2021, 8:49 PM Mario Giammarco <mgiammarco@xxxxxxxxx> wrote:
>
>> Thanks Simon and thanks to other people that have replied.
>> Sorry but I try to explain myself better.
>> It is evident to me that if I have two copies of data, one brokes and
>> while
>> ceph creates again a new copy of the data also the disk with the second
>> copy brokes you lose the data.
>> It is obvious and a bit paranoid because many servers on many customers
>> run
>> on raid1 and so you are saying: yeah you have two copies of the data but
>> you can broke both. Consider that in ceph recovery is automatic, with
>> raid1
>> some one must manually go to the customer and change disks. So ceph is
>> already an improvement in this case even with size=2. With size 3 and min
>> 2
>> it is a bigger improvement I know.
>>
>> What I ask is this: what happens with min_size=1 and split brain, network
>> down or similar things: do ceph block writes because it has no quorum on
>> monitors? Are there some failure scenarios that I have not considered?
>> Thanks again!
>> Mario
>>
>>
>>
>> Il giorno mer 3 feb 2021 alle ore 17:42 Simon Ironside <
>> sironside@xxxxxxxxxxxxx> ha scritto:
>>
>> > On 03/02/2021 09:24, Mario Giammarco wrote:
>> > > Hello,
>> > > Imagine this situation:
>> > > - 3 servers with ceph
>> > > - a pool with size 2 min 1
>> > >
>> > > I know perfectly the size 3 and min 2 is better.
>> > > I would like to know what is the worst thing that can happen:
>> >
>> > Hi Mario,
>> >
>> > This thread is worth a read, it's an oldie but a goodie:
>> >
>> >
>> >
>> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-December/014846.html
>> >
>> > Especially this post, which helped me understand the importance of
>> > min_size=2
>> >
>> >
>> >
>> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-December/014892.html
>> >
>> > Cheers,
>> > Simon
>> > _______________________________________________
>> > ceph-users mailing list -- ceph-users@xxxxxxx
>> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
>> >
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@xxxxxxx
>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>>
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx