Re: Erasure pool performance expectations

Peter Kerdisle <peter.kerdisle@xxxxxxxxx> · Mon, 16 May 2016 09:39:17 +0000

I'm forcing a flush by lower the cache_target_dirty_ratio to a lower value. This forces writes to the EC pool, these are the operations I'm trying to throttle a bit. I am understanding you correctly that's throttling only works for the other way around? Promoting cold objects into the hot cache?
The measurement is a problem for me at the moment. I'm trying to get the perf dumps into collectd/graphite but it seems I need to hand roll a solution since the plugins I found are not working anymore. What I'm doing now is just summing the bandwidth statistics from my nodes to get an approximated number. I hope to make some time this week to write a collectd plugin to fetch get the actual stats from perf dumps.

I confirmed the settings are indeed correctly picked up across the nodes in the cluster. 

I tried switching my pool to readforward since for my needs the EC pool is fast enough for reads but I got scared when I got the warning about data corruption. How safe is readforward really at this point? I noticed the option was removed from the latest docs while still living on the google cached version: http://webcache.googleusercontent.com/search?q=cache:http://docs.ceph.com/docs/master/rados/operations/cache-tiering/

On Mon, May 16, 2016 at 11:14 AM Nick Fisk <nick@xxxxxxxxxx> wrote:
> -----Original Message-----

> From: Peter Kerdisle [mailto:peter.kerdisle@xxxxxxxxx]

> Sent: 15 May 2016 08:04

> To: Nick Fisk <nick@xxxxxxxxxx>

> Cc: ceph-users@xxxxxxxxxxxxxx

> Subject: Re:  Erasure pool performance expectations

>

> Hey Nick,

>

> I've been playing around with the osd_tier_promote_max_bytes_sec setting

> but I'm not really seeing any changes.

>

> What would be expected when setting a max bytes value? I would expected

> that my OSDs would throttle themselves to this rate when doing promotes

> but this doesn't seem to be the case. When I set it to 2MB I would expect a

> node with 10 OSDs to do a max of 20MB/s during promotions. Is this math

> correct?

Yes that sounds about right, but this will only be for optional promotions (ie reads that meet the recency/hitset settings). If you are doing any writes, they will force the object to be promoted as you can't directly write to an EC pool. And also don't forget that once the cache pool is full, it will start evicting/flushing cold objects for every new object that gets promoted into it.

Few questions

1. What promotion rates are you seeing?

2. How are you measuring the promotion rate just out of interest?

3. Can you confirm that the OSD is picking up that setting  correctly by running something like (sudo ceph --admin-daemon /var/run/ceph/ceph-osd.0.asok config show | grep promote)?

>

> Thanks,

>

> Peter

>

> On Tue, May 10, 2016 at 3:48 PM, Nick Fisk <nick@xxxxxxxxxx> wrote:

>

>

> > -----Original Message-----

> > From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf

> Of

> > Peter Kerdisle

> > Sent: 10 May 2016 14:37

> > Cc: ceph-users@xxxxxxxxxxxxxx

> > Subject: Re:  Erasure pool performance expectations

> >

> > To answer my own question it seems that you can change settings on the

> fly

> > using

> >

> > ceph tell osd.* injectargs '--osd_tier_promote_max_bytes_sec 5242880'

> > osd.0: osd_tier_promote_max_bytes_sec = '5242880' (unchangeable)

> >

> > However the response seems to imply I can't change this setting. Is there

> an

> > other way to change these settings?

>

> Sorry Peter, I missed your last email. You can also specify that setting in the

> ceph.conf, ie I have in mine

>

> osd_tier_promote_max_bytes_sec = 4000000

>

>

>

> >

> >

> > On Sun, May 8, 2016 at 2:37 PM, Peter Kerdisle

> <peter.kerdisle@xxxxxxxxx>

> > wrote:

> > Hey guys,

> >

> > I noticed the merge request that fixes the switch around here

> > https://github.com/ceph/ceph/pull/8912

> >

> > I had two questions:

> >

> > • Does this effect my performance in any way? Could it explain the slow

> > requests I keep having?

> > • Can I modify these settings manually myself on my cluster?

> > Thanks,

> >

> > Peter

> >

> >

> > On Fri, May 6, 2016 at 9:58 AM, Peter Kerdisle <peter.kerdisle@xxxxxxxxx>

> > wrote:

> > Hey Mark,

> >

> > Sorry I missed your message as I'm only subscribed to daily digests.

> >

> > Date: Tue, 3 May 2016 09:05:02 -0500

> > From: Mark Nelson <mnelson@xxxxxxxxxx>

> > To: ceph-users@xxxxxxxxxxxxxx

> > Subject: Re:  Erasure pool performance expectations

> > Message-ID: <df3de049-a7f9-7f86-3ed3-47079e4012b9@xxxxxxxxxx>

> > Content-Type: text/plain; charset=windows-1252; format=flowed

> > In addition to what nick said, it's really valuable to watch your cache

> > tier write behavior during heavy IO.  One thing I noticed is you said

> > you have 2 SSDs for journals and 7 SSDs for data.

> >

> > I thought the hardware recommendations were 1 journal disk per 3 or 4

> data

> > disks but I think I might have misunderstood it. Looking at my journal

> > read/writes they seem to be ok

> > though:

> https://www.dropbox.com/s/er7bei4idd56g4d/Screenshot%202016-

> > 05-06%2009.55.30.png?dl=0

> >

> > However I started running into a lot of slow requests (made a separate

> > thread for those: Diagnosing slow requests) and now I'm hoping these

> could

> > be related to my journaling setup.

> >

> > If they are all of

> > the same type, you're likely bottlenecked by the journal SSDs for

> > writes, which compounded with the heavy promotions is going to really

> > hold you back.

> > What you really want:

> > 1) (assuming filestore) equal large write throughput between the

> > journals and data disks.

> > How would one achieve that?

> >

> > 2) promotions to be limited by some reasonable fraction of the cache

> > tier and/or network throughput (say 70%).  This is why the

> > user-configurable promotion throttles were added in jewel.

> > Are these already in the docs somewhere?

> >

> > 3) The cache tier to fill up quickly when empty but change slowly once

> > it's full (ie limiting promotions and evictions).  No real way to do

> > this yet.

> > Mark

> >

> > Thanks for your thoughts.

> >

> > Peter

> >

> >

>

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com