Re: Erasure pool performance expectations

Christian Balzer <chibi@xxxxxxx> · Mon, 16 May 2016 19:54:26 +0900

Hello,

On Mon, 16 May 2016 12:49:48 +0200 Peter Kerdisle wrote:

> Thanks yet again Nick for the help and explanations. I will experiment
> some more and see if I can get the slow requests further down and
> increase the overall performance.
> 

And as I probably mentioned before, if your cache is large enough, you can
lower the dirty ratio during off-peak hours and get not flushes (they are
the expensive ones, not the evictions) at during peak hours.

In my use case a typical day doesn't incur more than 3% cache promotions,
so I drop the dirty ratio from .60 to .57 for 10 minutes before midnight.

Christian

> On Mon, May 16, 2016 at 12:20 PM, Nick Fisk <nick@xxxxxxxxxx> wrote:
> 
> >
> >
> > > -----Original Message-----
> > > From: Peter Kerdisle [mailto:peter.kerdisle@xxxxxxxxx]
> > > Sent: 16 May 2016 11:04
> > > To: Nick Fisk <nick@xxxxxxxxxx>
> > > Cc: ceph-users@xxxxxxxxxxxxxx
> > > Subject: Re:  Erasure pool performance expectations
> > >
> > >
> > > On Mon, May 16, 2016 at 11:58 AM, Nick Fisk <nick@xxxxxxxxxx> wrote:
> > > > -----Original Message-----
> > > > From: Peter Kerdisle [mailto:peter.kerdisle@xxxxxxxxx]
> > > > Sent: 16 May 2016 10:39
> > > > To: nick@xxxxxxxxxx
> > > > Cc: ceph-users@xxxxxxxxxxxxxx
> > > > Subject: Re:  Erasure pool performance expectations
> > > >
> > > > I'm forcing a flush by lower the cache_target_dirty_ratio to a
> > > > lower
> > value.
> > > > This forces writes to the EC pool, these are the operations I'm
> > > > trying
> > to
> > > > throttle a bit. I am understanding you correctly that's throttling
> > only works
> > > for
> > > > the other way around? Promoting cold objects into the hot cache?
> > >
> > > Yes that’s correct. You want to throttle the flushes which is done by
> > another
> > > setting(s)
> > >
> > > Firstly set something like this in your ceph.conf
> > > osd_agent_max_low_ops = 1
> > > osd_agent_max_ops = 4
> > > I did not know about this, that's great, I will play around with
> > > these.
> > >
> > >
> > > This controls how many parallel threads the tiering agent will use.
> > > You
> > can
> > > bump them up later if needed.
> > >
> > > Next set on your cache pools, these two settings. Try and keep them
> > > about .2 apart. So something like .4 and .6 are good to start with.
> > > cache_target_dirty_ratio
> > > cache_target_dirty_high_ratio
> > > Here is actually the heart of the matter. Ideally I would love to
> > > run it
> > at 0.0 if
> > > that makes sense. I want no dirty objects in my hot cache at all, has
> > anybody
> > > ever tried this? Right now I'm just pushing cache_target_dirty_ratio
> > during
> > > low activity moments by setting it to 0.2 and then bringing it back
> > > up
> > to 0.6
> > > when it's done or activity starts up again.
> >
> > You might want to rethink that slightly. Keeping in mind that with EC
> > pools currently any write will force a promotion and then dirty the
> > object. If you are then almost immediately flushing the objects back
> > down, you are going to end up with a lot of amplification for writes.
> > You want to keep dirty objects in the cache pool so that you don't
> > incur this penalty if they are going to be written to again in the
> > near future.
> >
> > I'm guessing what you want is a buffer, so that you can have bursts of
> > activity without incurring the performance penalty of flushing? That’s
> > hopefully what the high and low flushes should give you. By setting
> > to .4 and .6, you will have a .2x"Cache Tier Capacity" buffer of cache
> > tier space, where only slow flushing will occur.
> >
> > >
> > >
> > > And let me know if that helps.
> > >
> > >
> > >
> > > >
> > > > The measurement is a problem for me at the moment. I'm trying to
> > > > get
> > the
> > > > perf dumps into collectd/graphite but it seems I need to hand roll
> > > > a
> > solution
> > > > since the plugins I found are not working anymore. What I'm doing
> > > > now
> > is
> > > > just summing the bandwidth statistics from my nodes to get an
> > > > approximated number. I hope to make some time this week to write a
> > > > collectd plugin to fetch get the actual stats from perf dumps.
> > >
> > > I've used diamond to collect the stats and it worked really well. I
> > > can
> > share
> > > my graphite query to sum the promote/flush rates as well if it helps?
> > > I will check out diamond, are you using this specifically?
> > > https://github.com/BrightcoveOS/Diamond/wiki/collectors-CephCollector
> > >
> > > It would be great if you could share your graphite queries :)
> >
> > You will need to modify them to suit your environment. But in the below
> > examples 4 and 7 are the SSD's in the cache tier. You can probably
> > change the "Ceph-Test" server to "*" to catch all servers
> >
> > alias(scaleToSeconds(nonNegativeDerivative(sumSeries(servers.Ceph-Test.CephCollector.ceph.osd.[4|7].osd.op_r)),1),"SSD
> > R IOP/s")
> > alias(scaleToSeconds(nonNegativeDerivative(sumSeries(servers.Ceph-Test.CephCollector.ceph.osd.[4|7].osd.tier_proxy_read)),1),"Proxy
> > Reads/s")
> >
> > alias(scaleToSeconds(nonNegativeDerivative(sumSeries(servers.Ceph-Test.CephCollector.ceph.osd.[4|7].osd.tier_try_flush)),1),"Flushes/s")
> >
> > alias(scaleToSeconds(nonNegativeDerivative(sumSeries(servers.Ceph-Test.CephCollector.ceph.osd.[4|7].osd.tier_promote)),1),"Promotions/s")
> >
> > alias(scaleToSeconds(nonNegativeDerivative(sumSeries(servers.Ceph-Test.CephCollector.ceph.osd.[4|7].osd.tier_evict)),1),"Evictions/s")
> > alias(scaleToSeconds(nonNegativeDerivative(sumSeries(servers.Ceph-Test.CephCollector.ceph.osd.[4|7].osd.tier_proxy_write)),1),"Proxy
> > Writes/s")
> > alias(scale(servers.Ceph-Test.CephPoolsCollector.ceph.pools.cache2.MB_used,0.001024),"GB
> > Cache Used")
> > alias(scale(servers.Ceph-Test.CephPoolsCollector.ceph.pools.cache2.target_max_MB,0.001024),"Target
> > Max GB")
> > alias(scale(servers.Ceph-Test.CephPoolsCollector.ceph.pools.cache2.dirty_objects,0.004),"Dirty
> > GB")
> >
> > >
> > > >
> > > > I confirmed the settings are indeed correctly picked up across the
> > nodes in
> > > > the cluster.
> > >
> > > Good, glad we got that sorted
> > >
> > > >
> > > > I tried switching my pool to readforward since for my needs the EC
> > pool is
> > > > fast enough for reads but I got scared when I got the warning about
> > data
> > > > corruption. How safe is readforward really at this point? I
> > > > noticed the
> > > option
> > > > was removed from the latest docs while still living on the google
> > cached
> > > > version:
> > > http://webcache.googleusercontent.com/search?q=cache:http://d
> > > > ocs.ceph.com/docs/master/rados/operations/cache-tiering/
> > >
> > > Not too sure about the safety, but I'm in the view that those extra
> > > modes probably aren’t needed, I'm pretty sure the same effect can be
> > controlled via
> > > the recency settings (Someone correct me please). The higher the
> > > recency settings, the less likely an object will be chosen to be
> > > promoted into
> > the
> > > cache tier. If you set the min_recency for reads to be higher than
> > > the
> > max
> > > hit_set count. Then in theory no reads will ever cause an object to
> > > be promoted.
> > > You are right, your earlier help made me do exactly that and things
> > > have been working better since.
> > >
> > > Thanks!
> >
> >

-- 
Christian Balzer        Network/Systems Engineer                
chibi@xxxxxxx   	Global OnLine Japan/Rakuten Communications
http://www.gol.com/
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com