Re: Erasure pool performance expectations

Peter Kerdisle <peter.kerdisle@xxxxxxxxx> · Tue, 3 May 2016 15:40:34 +0200

Thank you, I will attempt to play around with these settings and see if I can achieve better read performance. 
Appreciate your insights.

Peter

On Tue, May 3, 2016 at 3:00 PM, Nick Fisk <nick@xxxxxxxxxx> wrote:

> -----Original Message-----

> From: Peter Kerdisle [mailto:peter.kerdisle@xxxxxxxxx]

> Sent: 03 May 2016 12:15

> To: nick@xxxxxxxxxx

> Cc: ceph-users@xxxxxxxxxxxxxx

> Subject: Re:  Erasure pool performance expectations

>

> Hey Nick,

>

> Thanks for taking the time to answer my questions. Some in-line comments.

>

> On Tue, May 3, 2016 at 10:51 AM, Nick Fisk <nick@xxxxxxxxxx> wrote:

> Hi Peter,

>

>

> > -----Original Message-----

> > From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf

> Of

> > Peter Kerdisle

> > Sent: 02 May 2016 08:17

> > To: ceph-users@xxxxxxxxxxxxxx

> > Subject:  Erasure pool performance expectations

> >

> > Hi guys,

> >

> > I am currently testing the performance of RBD using a cache pool and a 4/2

> > erasure profile pool.

> >

> > I have two SSD cache servers (2 SSDs for journals, 7 SSDs for data) with

> > 2x10Gbit bonded each and a six OSD nodes with a 10Gbit public and 10Gbit

> > cluster network for the erasure pool (10x3TB without separate journal).

> This

> > is all on Jewel.

> >

> > What I would like to know is if the performance I'm seeing is to be

> expected

> > and if there is some way to test this in a more qualifiable way.

> >

> > Everything works as expected if the files are present on the cache pool,

> > however when things need to be retrieved from the cache pool I see

> > performance degradation. I'm trying to simulate real usage as much as

> > possible and trying to retrieve files from the RBD volume over FTP from a

> > client server. What I'm seeing is that the FTP transfer will stall for seconds

> at a

> > time and then get some more data which results in an average speed of

> > 200KB/s. From the cache this is closer to 10MB/s. Is this the expected

> > behaviour from a erasure coded tier with cache in front?

>

> Unfortunately yes. The whole Erasure/Cache thing only really works well if

> the data in the EC tier is only accessed infrequently, otherwise the overheads

> in cache promotion/flushing quickly brings the cluster down to its knees.

> However it looks as though you are mainly doing reads, which means you can

> probably alter your cache settings to not promote so aggressively on reads,

> as reads can be proxied through to the EC tier instead of promoting. This

> should reduce the amount of required cache promotions.

>

> You are correct that reads have a lower priority of being cached, only when

> they are used very frequently should this be done in an ideal situation.

>

>

> Can you try setting min_read_recency_for promote to something higher?

>

> I looked into the setting before but I must admit it's exact purpose eludes me

> still. Would it be correct to simplify it as 'min_read_recency_for_promote

> determines the amount of times a piece would have to be read in a certain

> interval (set by hit_set_period) in order to promote it to the caching tier' ?

Yes that’s correct. Every hit_set_period (assuming there is IO going on) a new hitset is created up until the hit_set_count limit. The recency defines how many of the last x hitsets an object must have been accessed in.

Tuning it is a bit of a dark art at the moment as you have to try and get all the values to match your workload. For starters try something like

Read recency =  2 or 3

Hit_set_count =10

Hit_set_period=60

Which will mean if an object is read more than 2 or 3 times in a row within the last few minutes it will be promoted. There is no granularity below a single hitset, so if an object gets hit a 1000 times in 1 minute but then nothing for 5 minutes it will not cause a promotion.

>

>

> Also can you check what your hit_set_period and hit_set_count is currently

> set to.

>

> hit_set_count is set to 1 and hit_set_period to 1800.

>

> What would increasing the hit_set_count do exactly?

>

>

>

> > Right now I'm unsure how to scientifically test the performance retrieving

> > files when there is a cache miss. If somebody could point me towards a

> > better way of doing that I would appreciate the help.

> >

> > An other thing is that I'm seeing a lot of messages popping up in dmesg on

> > my client server on which the RBD volumes are mounted. (IPs removed)

> >

> > [685881.477383] libceph: osd50 :6800 socket closed (con state OPEN)

> > [685895.597733] libceph: osd54 :6808 socket closed (con state OPEN)

> > [685895.663971] libceph: osd54 :6808 socket closed (con state OPEN)

> > [685895.710424] libceph: osd54 :6808 socket closed (con state OPEN)

> > [685895.749417] libceph: osd54 :6808 socket closed (con state OPEN)

> > [685896.517778] libceph: osd54 :6808 socket closed (con state OPEN)

> > [685906.690445] libceph: osd74 :6824 socket closed (con state OPEN)

> >

> > Is this a symptom of something?

>

> This is just stale connections to the OSD's timing out after the idle period and

> is nothing to worry about.

>

> Glad to hear that, I was fearing something might be wrong.

>

> Thanks again.

>

> Peter

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com