Hello, On Wed, 6 Apr 2016 20:35:20 +0200 Oliver Dzombic wrote: > Hi, > > i have some IO issues, and after Christian's great article/hint about > caches i plan to add caches too. > Thanks, version 2 is still a work in progress, as I keep running into unknowns. IO issues in what sense, like in too many write IOPS for the current HW to sustain? Also, what are you using Ceph for, RBD hosting VM images? It will help you a lot if you can identify and quantify the usage patterns (including a rough idea on how many hot objects you have) and where you run into limits. > So now comes the troublesome question: > > How much dangerous is it to add cache tiers in an existing cluster with > around 30 OSD's and 40 TB of Data on 3-6 ( currently reducing ) nodes. > You're reducing nodes? Why? More nodes/OSDs equates to more IOPS in general. 40TB is a sizable amount of data, how many objects does you cluster hold? Also is that raw data or after replication (size 3?)? In short, "ceph -s" output please. ^.^ > I mean will just everything explode and i just die, or how is the road > map to introduce this, after you have an already running cluster ? > That's pretty much straightforward from the Ceph docs at: http://docs.ceph.com/docs/master/rados/operations/cache-tiering/ (replace master with hammer if you're running that) Nothing happens until the "set-overlay" bit and you will want to configure all the pertinent bits before that. A basic question is if you will have dedicated SSD cache tier hosts or have the SSDs holding the cache pool in your current hosts. Dedicated hosts have the advantage matched HW, CPU power the SSDs and simpler configuration, shared hosts can have the advantage of spreading the network load further out instead of having everything going through the cache tier nodes. The size and length of the explosion will entirely depend on: 1) how capable your current cluster is, how (over)loaded it is. 2) the actual load/usage at the time you phase the cache tier in 3) the amount of "truly hot" objects you have. As I wrote here: http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-February/007933.html In my case with a BADLY overload base pool and a constant stream of log/status writes (4-5MB/s, 1000 IOPS) from 200VMs it all stabilized after 10 minutes. Truly hot objects as mentioned above will be those (in the case of VM images) holding active directory inodes and files. > Anything that needs to be considered ? Dangerous no-no's ? > > Also it will happen, that i have to add the cache tiers server by > server, and not all at the same time. > You want at least 2 cache tier servers from the start and well known, well tested (LSI timeouts!) SSDs in them. Christian > I am happy for any kind of advice. > > Thank you ! > -- Christian Balzer Network/Systems Engineer chibi@xxxxxxx Global OnLine Japan/Rakuten Communications http://www.gol.com/ _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com