Re: Anybody using 4x (size=4) replication?

Wido den Hollander <wido@xxxxxxxx> · Thu, 25 Jul 2019 09:39:14 +0200

On 7/25/19 9:19 AM, Xiaoxi Chen wrote:
> We had hit this case in production but my solution will be change
> min_size = 1 immediately so that PG back to active right after.
> 
> It somewhat tradeoff reliability(durability) with availability during
> that window of 15 mins but if you are certain one out of two "failure"
> is due to recoverable issue, it worth to do so.
> 

That's actually dangerous imho.

Because while you set min_size=1 you will be mutating data on that
single disk/OSD.

If the other two OSDs come back recovery will start. Now IF that single
disk/OSD now dies while performing the recovery you have lost data.

The PG (or PGs) becomes inactive and you either need to perform data
recovery on the failed disk or revert back to the last state.

I can't take that risk in this situation.

Wido

> My 0.02
> 
> Wido den Hollander <wido@xxxxxxxx <mailto:wido@xxxxxxxx>> 于2019年7月25
> 日周四 上午3:48写道：
> 
> 
> 
>     On 7/24/19 9:35 PM, Mark Schouten wrote:
>     > I’d say the cure is worse than the issue you’re trying to fix, but
>     that’s my two cents.
>     >
> 
>     I'm not completely happy with it either. Yes, the price goes up and
>     latency increases as well.
> 
>     Right now I'm just trying to find a clever solution to this. It's a 2k
>     OSD cluster and the likelihood of an host or OSD crashing is reasonable
>     while you are performing maintenance on a different host.
> 
>     All kinds of things have crossed my mind where using size=4 is one
>     of them.
> 
>     Wido
> 
>     > Mark Schouten
>     >
>     >> Op 24 jul. 2019 om 21:22 heeft Wido den Hollander <wido@xxxxxxxx
>     <mailto:wido@xxxxxxxx>> het volgende geschreven:
>     >>
>     >> Hi,
>     >>
>     >> Is anybody using 4x (size=4, min_size=2) replication with Ceph?
>     >>
>     >> The reason I'm asking is that a customer of mine asked me for a
>     solution
>     >> to prevent a situation which occurred:
>     >>
>     >> A cluster running with size=3 and replication over different
>     racks was
>     >> being upgraded from 13.2.5 to 13.2.6.
>     >>
>     >> During the upgrade, which involved patching the OS as well, they
>     >> rebooted one of the nodes. During that reboot suddenly a node in a
>     >> different rack rebooted. It was unclear why this happened, but
>     the node
>     >> was gone.
>     >>
>     >> While the upgraded node was rebooting and the other node crashed
>     about
>     >> 120 PGs were inactive due to min_size=2
>     >>
>     >> Waiting for the nodes to come back, recovery to finish it took
>     about 15
>     >> minutes before all VMs running inside OpenStack were back again.
>     >>
>     >> As you are upgraded or performing any maintenance with size=3 you
>     can't
>     >> tolerate a failure of a node as that will cause PGs to go inactive.
>     >>
>     >> This made me think about using size=4 and min_size=2 to prevent this
>     >> situation.
>     >>
>     >> This obviously has implications on write latency and cost, but it
>     would
>     >> prevent such a situation.
>     >>
>     >> Is anybody here running a Ceph cluster with size=4 and min_size=2 for
>     >> this reason?
>     >>
>     >> Thank you,
>     >>
>     >> Wido
>     >> _______________________________________________
>     >> ceph-users mailing list
>     >> ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>
>     >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>     _______________________________________________
>     ceph-users mailing list
>     ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>
>     http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com