Re: Erasure pool performance expectations

Nick Fisk <nick@xxxxxxxxxx> · Mon, 16 May 2016 10:58:11 +0100

> -----Original Message-----
> From: Peter Kerdisle [mailto:peter.kerdisle@xxxxxxxxx]
> Sent: 16 May 2016 10:39
> To: nick@xxxxxxxxxx
> Cc: ceph-users@xxxxxxxxxxxxxx
> Subject: Re:  Erasure pool performance expectations
> 
> I'm forcing a flush by lower the cache_target_dirty_ratio to a lower value.
> This forces writes to the EC pool, these are the operations I'm trying to
> throttle a bit. I am understanding you correctly that's throttling only works for
> the other way around? Promoting cold objects into the hot cache?

Yes that’s correct. You want to throttle the flushes which is done by another setting(s)

Firstly set something like this in your ceph.conf
osd_agent_max_low_ops = 1
osd_agent_max_ops = 4

This controls how many parallel threads the tiering agent will use. You can bump them up later if needed.

Next set on your cache pools, these two settings. Try and keep them about .2 apart. So something like .4 and .6 are good to start with.
cache_target_dirty_ratio
cache_target_dirty_high_ratio

And let me know if that helps.

> 
> The measurement is a problem for me at the moment. I'm trying to get the
> perf dumps into collectd/graphite but it seems I need to hand roll a solution
> since the plugins I found are not working anymore. What I'm doing now is
> just summing the bandwidth statistics from my nodes to get an
> approximated number. I hope to make some time this week to write a
> collectd plugin to fetch get the actual stats from perf dumps.

I've used diamond to collect the stats and it worked really well. I can share my graphite query to sum the promote/flush rates as well if it helps?

> 
> I confirmed the settings are indeed correctly picked up across the nodes in
> the cluster.

Good, glad we got that sorted

> 
> I tried switching my pool to readforward since for my needs the EC pool is
> fast enough for reads but I got scared when I got the warning about data
> corruption. How safe is readforward really at this point? I noticed the option
> was removed from the latest docs while still living on the google cached
> version: http://webcache.googleusercontent.com/search?q=cache:http://d
> ocs.ceph.com/docs/master/rados/operations/cache-tiering/

Not too sure about the safety, but I'm in the view that those extra modes probably aren’t needed, I'm pretty sure the same effect can be controlled via the recency settings (Someone correct me please). The higher the recency settings, the less likely an object will be chosen to be promoted into the cache tier. If you set the min_recency for reads to be higher than the max hit_set count. Then in theory no reads will ever cause an object to be promoted.

Nick

> 
> 
> 
> On Mon, May 16, 2016 at 11:14 AM Nick Fisk <nick@xxxxxxxxxx> wrote:
> > -----Original Message-----
> > From: Peter Kerdisle [mailto:peter.kerdisle@xxxxxxxxx]
> > Sent: 15 May 2016 08:04
> > To: Nick Fisk <nick@xxxxxxxxxx>
> > Cc: ceph-users@xxxxxxxxxxxxxx
> > Subject: Re:  Erasure pool performance expectations
> >
> > Hey Nick,
> >
> > I've been playing around with the osd_tier_promote_max_bytes_sec
> setting
> > but I'm not really seeing any changes.
> >
> > What would be expected when setting a max bytes value? I would
> expected
> > that my OSDs would throttle themselves to this rate when doing promotes
> > but this doesn't seem to be the case. When I set it to 2MB I would expect a
> > node with 10 OSDs to do a max of 20MB/s during promotions. Is this math
> > correct?
> 
> Yes that sounds about right, but this will only be for optional promotions (ie
> reads that meet the recency/hitset settings). If you are doing any writes,
> they will force the object to be promoted as you can't directly write to an EC
> pool. And also don't forget that once the cache pool is full, it will start
> evicting/flushing cold objects for every new object that gets promoted into
> it.
> 
> Few questions
> 
> 1. What promotion rates are you seeing?
> 
> 2. How are you measuring the promotion rate just out of interest?
> 
> 3. Can you confirm that the OSD is picking up that setting  correctly by running
> something like (sudo ceph --admin-daemon /var/run/ceph/ceph-osd.0.asok
> config show | grep promote)?
> 
> >
> > Thanks,
> >
> > Peter
> >
> > On Tue, May 10, 2016 at 3:48 PM, Nick Fisk <nick@xxxxxxxxxx> wrote:
> >
> >
> > > -----Original Message-----
> > > From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On
> Behalf
> > Of
> > > Peter Kerdisle
> > > Sent: 10 May 2016 14:37
> > > Cc: ceph-users@xxxxxxxxxxxxxx
> > > Subject: Re:  Erasure pool performance expectations
> > >
> > > To answer my own question it seems that you can change settings on the
> > fly
> > > using
> > >
> > > ceph tell osd.* injectargs '--osd_tier_promote_max_bytes_sec 5242880'
> > > osd.0: osd_tier_promote_max_bytes_sec = '5242880' (unchangeable)
> > >
> > > However the response seems to imply I can't change this setting. Is there
> > an
> > > other way to change these settings?
> >
> > Sorry Peter, I missed your last email. You can also specify that setting in the
> > ceph.conf, ie I have in mine
> >
> > osd_tier_promote_max_bytes_sec = 4000000
> >
> >
> >
> > >
> > >
> > > On Sun, May 8, 2016 at 2:37 PM, Peter Kerdisle
> > <peter.kerdisle@xxxxxxxxx>
> > > wrote:
> > > Hey guys,
> > >
> > > I noticed the merge request that fixes the switch around here
> > > https://github.com/ceph/ceph/pull/8912
> > >
> > > I had two questions:
> > >
> > > • Does this effect my performance in any way? Could it explain the slow
> > > requests I keep having?
> > > • Can I modify these settings manually myself on my cluster?
> > > Thanks,
> > >
> > > Peter
> > >
> > >
> > > On Fri, May 6, 2016 at 9:58 AM, Peter Kerdisle
> <peter.kerdisle@xxxxxxxxx>
> > > wrote:
> > > Hey Mark,
> > >
> > > Sorry I missed your message as I'm only subscribed to daily digests.
> > >
> > > Date: Tue, 3 May 2016 09:05:02 -0500
> > > From: Mark Nelson <mnelson@xxxxxxxxxx>
> > > To: ceph-users@xxxxxxxxxxxxxx
> > > Subject: Re:  Erasure pool performance expectations
> > > Message-ID: <df3de049-a7f9-7f86-3ed3-47079e4012b9@xxxxxxxxxx>
> > > Content-Type: text/plain; charset=windows-1252; format=flowed
> > > In addition to what nick said, it's really valuable to watch your cache
> > > tier write behavior during heavy IO.  One thing I noticed is you said
> > > you have 2 SSDs for journals and 7 SSDs for data.
> > >
> > > I thought the hardware recommendations were 1 journal disk per 3 or 4
> > data
> > > disks but I think I might have misunderstood it. Looking at my journal
> > > read/writes they seem to be ok
> > > though:
> > https://www.dropbox.com/s/er7bei4idd56g4d/Screenshot%202016-
> > > 05-06%2009.55.30.png?dl=0
> > >
> > > However I started running into a lot of slow requests (made a separate
> > > thread for those: Diagnosing slow requests) and now I'm hoping these
> > could
> > > be related to my journaling setup.
> > >
> > > If they are all of
> > > the same type, you're likely bottlenecked by the journal SSDs for
> > > writes, which compounded with the heavy promotions is going to really
> > > hold you back.
> > > What you really want:
> > > 1) (assuming filestore) equal large write throughput between the
> > > journals and data disks.
> > > How would one achieve that?
> > >
> > > 2) promotions to be limited by some reasonable fraction of the cache
> > > tier and/or network throughput (say 70%).  This is why the
> > > user-configurable promotion throttles were added in jewel.
> > > Are these already in the docs somewhere?
> > >
> > > 3) The cache tier to fill up quickly when empty but change slowly once
> > > it's full (ie limiting promotions and evictions).  No real way to do
> > > this yet.
> > > Mark
> > >
> > > Thanks for your thoughts.
> > >
> > > Peter
> > >
> > >
> >

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com