From: zrzhit@xxxxxxxxx [mailto:zrzhit@xxxxxxxxx] On Behalf Of Rongze Zhu
Sent: Monday, July 29, 2013 2:18 PM
To: Chen, Xiaoxi
Cc: Gregory Farnum; ceph-users@xxxxxxxxxxxxxx
Subject: Re: add crush rule in one command
On Sat, Jul 27, 2013 at 4:25 PM, Chen, Xiaoxi <xiaoxi.chen@xxxxxxxxx> wrote:
My 0.02:
1. Why you need to simultaneously set the map for your purpose ? It’s obvious very important for ceph to have an atomic CLI , but this is just because the map may be changed by cluster itself ( loss node or what), but not for your case. Since the map can be auto-distributed by ceph, I really think it’s a good idea to just change your own code , to have the map changing stuff only happen in one node.
We need auto-scaling. When a storage node is added to cluster, puppet agent will deploy ceph on the node and create a local pool for it. A dedicated node for creating pools is more complexity, because we need elect the dedicated node and the node is single point of failure.
Well, from my point of view, it’s not that easy to implement an atomic map change CLI for ceph, combining these 3 commands together is obvious not enough. But I would be very happy if anybody implement one.
Technically speaking ,yes, it’s a SPOF, but ,I would say it’s OK from engineering aspect. Adding a node to a cluster is not the case that happened every day, and definitely you cannot add a physical node “automatically” , so it’s too easy to checking if a dedicated control node (for map management) is alive before you do so. For instance, would you mind your console proxy machine to the backend cluster be the SPOF ? If yes, just have two and try another on failure, will you even write a “auto-election” and “fail-detection” application for that ?
There are many ways to do that, we will evaluate these ways. Our goal is that making it high available for our customers :)
2. We are also evaluating the pros and cons for “local pool” , well, the only pros is you can save the network BW for read. You may want to say latency, I agree with you before but after we have a complete latency breakdown for ceph, showing that the network latency can be neglect , even using full-ssd setup. The question remain is “how much BW can be save?”, well, unless you have some prior statement about the workload that is reading majority, or you still have to use a 10GbE link for Ceph to have a balanced throughput . But the drawback is really obvious, live-migration complexity, management complexity, and etc.
I agree that there are some drawbacks for "local pool" :) But I think network is shared resource, we should avoid ceph using excessive network resource(many enterprises suing 1GbE link in their network environment in china).
The BW is much decided by how much write throughput you would like to achieve. Unless customer can satisfied with <100MB/s aggregation BW for a single compute node (typically for a 2U WSM/SNB node, there are 32 cores there ,usually means at least 16 VMs, so less than 6MB/s for a single VM), or they will goes to 10Gb solution. There are still no enterprises that use Ceph in China (AFAIK) , but from the list, seems quite a lot of user goes to full 10Gb or even IB network.
The 10GbE is much cheaper than we first thought it was, a 48 ports 10GbE switch will only cost you 5~6K USD.
Xiaoxi
From: ceph-users-bounces@xxxxxxxxxxxxxx [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of Rongze Zhu
Sent: Friday, July 26, 2013 2:29 PM
To: Gregory Farnum
Cc: ceph-users@xxxxxxxxxxxxxx
Subject: Re: add crush rule in one command
On Fri, Jul 26, 2013 at 2:27 PM, Rongze Zhu <rongze@xxxxxxxxxxxxxxx> wrote:
On Fri, Jul 26, 2013 at 1:22 PM, Gregory Farnum <greg@xxxxxxxxxxx> wrote:
On Thu, Jul 25, 2013 at 7:41 PM, Rongze Zhu <rongze@xxxxxxxxxxxxxxx> wrote:
> Hi folks,
>
> Recently, I use puppet to deploy Ceph and integrate Ceph with OpenStack. We
> put computeand storage together in the same cluster. So nova-compute and
> OSDs will be in each server. We will create a local pool for each server,
> and the pool only use the disks of each server. Local pools will be used by
> Nova for root disk and ephemeral disk.Hmm, this is constraining Ceph quite a lot; I hope you've thought
about what this means in terms of data availability and even
utilization of your storage. :)
We also will create global pool for Cinder, the IOPS of global pool will be betther than local pool.
The benefit of local pool is reducing the network traffic between servers and Improving the management of storage. We use one same Ceph Gluster for Nova,Cinder,Glance, and create different pools(and diffenrent rules) for them. Maybe it need more testing :)
s/Gluster/Cluster/g
> In order to use the local pools, I need add some rules for the local pools
> to ensure the local pools using only local disks. There is only way to add
> rule in ceph:
>
> ceph osd getcrushmap -o crush-map
> crushtool -c crush-map.txt -o new-crush-map
> ceph osd setcrushmap -i new-crush-map
>
> If multiple servers simultaneously set crush map(puppet agent will do that),
> there is the possibility of consistency problems. So if there is an command
> for adding rule, which will be very convenient. Such as:
>
> ceph osd crush add rule -i new-rule-file
>
> Could I add the command into Ceph?We love contributions to Ceph, and this is an obvious hole in our
atomic CLI-based CRUSH manipulation which a fix would be welcome for.
Please be aware that there was a significant overhaul to the way these
commands are processed internally between Cuttlefish and
Dumpling-to-be that you'll need to deal with if you want to cross that
boundary. I also recommend looking carefully at how we do the
individual pool changes and how we handle whole-map injection to make
sure the interface you use and the places you do data extraction makes
sense. :)
Thank you for your quick reply, it is very useful for me :)
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
--
Rongze Zhu - 朱荣泽
Email: zrzhit@xxxxxxxxx
Blog: http://way4ever.com
Weibo: http://weibo.com/metaxenGithub: https://github.com/zhurongze
--
Rongze Zhu - 朱荣泽
Email: zrzhit@xxxxxxxxx
Blog: http://way4ever.com
Weibo: http://weibo.com/metaxenGithub: https://github.com/zhurongze
--
Rongze Zhu - 朱荣泽
Email: zrzhit@xxxxxxxxx
Blog: http://way4ever.com
Weibo: http://weibo.com/metaxenGithub: https://github.com/zhurongze
--
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com