Re: Erasure pool performance expectations

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thanks, I tried that earlier but so far I am still getting slow requests. Although I also found I didn't have writeback enabled on my hardware controller. It seems that after changing that and setting the max bytes things are a bit more stable, less slow requests popping up. The fact that the writeback seemed to have helped makes me think somehow it is the disks after all causing issues. I will see if I can diagnose this a bit further. Plotting some commit_latency_ms graphs and filtering the slowest OSDs does seem to consistently return the same disks. I might try to remove the OSDs from the cluster to see what happens then. 

Thanks for all your help so far!

On Wed, May 11, 2016 at 9:07 AM, Nick Fisk <nick@xxxxxxxxxx> wrote:

Hi Peter, yes just restart the OSD for the setting to take effect.

 

From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of Peter Kerdisle
Sent: 10 May 2016 19:06
To: Nick Fisk <nick@xxxxxxxxxx>


Cc: ceph-users@xxxxxxxxxxxxxx
Subject: Re: [ceph-users] Erasure pool performance expectations

 

Thanks Nick. I added it to my ceph.conf. I'm guessing this is an OSD setting and therefor I should restart my OSDs is that correct?

 

On Tue, May 10, 2016 at 3:48 PM, Nick Fisk <nick@xxxxxxxxxx> wrote:



> -----Original Message-----
> From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of
> Peter Kerdisle
> Sent: 10 May 2016 14:37
> Cc: ceph-users@xxxxxxxxxxxxxx
> Subject: Re: Erasure pool performance expectations
>
> To answer my own question it seems that you can change settings on the fly
> using
>
> ceph tell osd.* injectargs '--osd_tier_promote_max_bytes_sec 5242880'
> osd.0: osd_tier_promote_max_bytes_sec = '5242880' (unchangeable)
>
> However the response seems to imply I can't change this setting. Is there an
> other way to change these settings?

Sorry Peter, I missed your last email. You can also specify that setting in the ceph.conf, ie I have in mine

osd_tier_promote_max_bytes_sec = 4000000




>
>
> On Sun, May 8, 2016 at 2:37 PM, Peter Kerdisle <peter.kerdisle@xxxxxxxxx>
> wrote:
> Hey guys,
>
> I noticed the merge request that fixes the switch around here
> https://github.com/ceph/ceph/pull/8912
>
> I had two questions:
>
> • Does this effect my performance in any way? Could it explain the slow
> requests I keep having?
> • Can I modify these settings manually myself on my cluster?
> Thanks,
>
> Peter
>
>
> On Fri, May 6, 2016 at 9:58 AM, Peter Kerdisle <peter.kerdisle@xxxxxxxxx>
> wrote:
> Hey Mark,
>
> Sorry I missed your message as I'm only subscribed to daily digests.
>
> Date: Tue, 3 May 2016 09:05:02 -0500
> From: Mark Nelson <mnelson@xxxxxxxxxx>
> To: ceph-users@xxxxxxxxxxxxxx
> Subject: Re: Erasure pool performance expectations
> Message-ID: <df3de049-a7f9-7f86-3ed3-47079e4012b9@xxxxxxxxxx>
> Content-Type: text/plain; charset=windows-1252; format=flowed
> In addition to what nick said, it's really valuable to watch your cache
> tier write behavior during heavy IO.  One thing I noticed is you said
> you have 2 SSDs for journals and 7 SSDs for data.
>
> I thought the hardware recommendations were 1 journal disk per 3 or 4 data
> disks but I think I might have misunderstood it. Looking at my journal
> read/writes they seem to be ok
> though: https://www.dropbox.com/s/er7bei4idd56g4d/Screenshot%202016-
> 05-06%2009.55.30.png?dl=0
>
> However I started running into a lot of slow requests (made a separate
> thread for those: Diagnosing slow requests) and now I'm hoping these could
> be related to my journaling setup.
>
> If they are all of
> the same type, you're likely bottlenecked by the journal SSDs for
> writes, which compounded with the heavy promotions is going to really
> hold you back.
> What you really want:
> 1) (assuming filestore) equal large write throughput between the
> journals and data disks.
> How would one achieve that?
>
> 2) promotions to be limited by some reasonable fraction of the cache
> tier and/or network throughput (say 70%).  This is why the
> user-configurable promotion throttles were added in jewel.
> Are these already in the docs somewhere?
>
> 3) The cache tier to fill up quickly when empty but change slowly once
> it's full (ie limiting promotions and evictions).  No real way to do
> this yet.
> Mark
>
> Thanks for your thoughts.
>
> Peter
>
>

 



_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux