Re: Changing replica size of a running pool

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 05/05/17 21:32, Alejandro Comisario wrote:
Thanks David!
Any one ? more thoughts ?

On Wed, May 3, 2017 at 3:38 PM, David Turner <drakonstein@xxxxxxxxx> wrote:
Those are both things that people have done and both work.  Neither is optimal, but both options work fine.  The best option is to definitely just get a third node now as you aren't going to be getting it for additional space from it later.  Your usable space between a 2 node size 2 cluster and a 3 node size 3 cluster is identical.

If getting a third node is not possible, I would recommend a size 2 min_size 2 configuration.  You will block writes if either of your nodes or any copy of your data is down, but you will not get into an inconsistent state that can happen with min_size of 1 (and you can always set the min_size of a pool to 1 on the fly to perform maintenance).  If you go with the option to use the failure domain of OSDs instead of hosts and have size 3, then a single node going down will block writes into your cluster.  The only you gain from this is having 3 physical copies of the data until you get a third node, but a lot of backfilling when you change the crush rule.

A more complex option that I think would be a better solution than your 2 options would be to create 2 hosts in your crush map for each physical host and split the OSDs in each host evenly between them.  That way you can have 2 copies of data in a given node, but never all 3 copies.  You have your 3 copies of data and guaranteed that not all 3 are on the same host.  Assuming min_size of 2, you will still block writes if you restart either node.

Smart idea.
Or if you have space, size 4 min_size 2 and then you can still lose a node. And you might think that's more space, but in a way it isn't... if you count free space reserved for recovery. If your size 3 double nodes die, then the other has to recover to size 2 and then it'll use the same space as the size 4 pool. If the size 4 pool loses a node, it won't be able to recover... it'll stay size 2, which is what your size 3 pool would have been after recovery. So it's like it's pre-recovered. But you probably get a bit more write latency in this setup.

If modifying the hosts in your crush map doesn't sound daunting, then I would recommend going that route... For most people that is more complex than they'd like to go and I would say size 2 min_size 2 would be the way to go until you get a third node.  #my2cents

On Wed, May 3, 2017 at 12:41 PM Maximiliano Venesio <massimo@xxxxxxxxxxx> wrote:
Guys hi.

I have a Jewel Cluster composed by two storage servers which are configured on
the crush map as different buckets to store data.

I've to configure two new pools on this cluster with the certainty
that i'll have to add more servers in a short term.

Taking into account that the recommended replication size for every
pool is 3, i'm thinking in two possible scenarios.

1) Set the replica size in 2 now, and in the future change the replica
size to 3 on a running pool.
Is that possible? Can i have serious issues with the rebalance of the
pgs, changing the pool size on the fly ?

2) Set the replica size to 3, and change the ruleset to replicate by
OSD instead of HOST now, and in the future change this rule in the
ruleset to replicate again by host in a running pool.
Is that possible? Can i have serious issues with the rebalance of the
pgs, changing the ruleset in a running pool ?

Which do you think is the best option ?


Thanks in advanced.


Maximiliano Venesio
Chief Cloud Architect | NUBELIU
E-mail: massimo@nubeliu.comCell: +54 9 11 3770 1853
_
www.nubeliu.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




--
Alejandro Comisario
CTO | NUBELIU
E-mail: alejandro@xxxxxxxxxxxCell: +54 9 11 3770 1857
_
www.nubeliu.com


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


-- 

--------------------------------------------
Peter Maloney
Brockmann Consult
Max-Planck-Str. 2
21502 Geesthacht
Germany
Tel: +49 4152 889 300
Fax: +49 4152 889 333
E-mail: peter.maloney@xxxxxxxxxxxxxxxxxxxx
Internet: http://www.brockmann-consult.de
--------------------------------------------
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux