Re: Erasure pool performance expectations

Nick Fisk <nick@xxxxxxxxxx> · Tue, 3 May 2016 14:00:01 +0100

> -----Original Message-----
> From: Peter Kerdisle [mailto:peter.kerdisle@xxxxxxxxx]
> Sent: 03 May 2016 12:15
> To: nick@xxxxxxxxxx
> Cc: ceph-users@xxxxxxxxxxxxxx
> Subject: Re:  Erasure pool performance expectations
> 
> Hey Nick,
> 
> Thanks for taking the time to answer my questions. Some in-line comments.
> 
> On Tue, May 3, 2016 at 10:51 AM, Nick Fisk <nick@xxxxxxxxxx> wrote:
> Hi Peter,
> 
> 
> > -----Original Message-----
> > From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf
> Of
> > Peter Kerdisle
> > Sent: 02 May 2016 08:17
> > To: ceph-users@xxxxxxxxxxxxxx
> > Subject:  Erasure pool performance expectations
> >
> > Hi guys,
> >
> > I am currently testing the performance of RBD using a cache pool and a 4/2
> > erasure profile pool.
> >
> > I have two SSD cache servers (2 SSDs for journals, 7 SSDs for data) with
> > 2x10Gbit bonded each and a six OSD nodes with a 10Gbit public and 10Gbit
> > cluster network for the erasure pool (10x3TB without separate journal).
> This
> > is all on Jewel.
> >
> > What I would like to know is if the performance I'm seeing is to be
> expected
> > and if there is some way to test this in a more qualifiable way.
> >
> > Everything works as expected if the files are present on the cache pool,
> > however when things need to be retrieved from the cache pool I see
> > performance degradation. I'm trying to simulate real usage as much as
> > possible and trying to retrieve files from the RBD volume over FTP from a
> > client server. What I'm seeing is that the FTP transfer will stall for seconds
> at a
> > time and then get some more data which results in an average speed of
> > 200KB/s. From the cache this is closer to 10MB/s. Is this the expected
> > behaviour from a erasure coded tier with cache in front?
> 
> Unfortunately yes. The whole Erasure/Cache thing only really works well if
> the data in the EC tier is only accessed infrequently, otherwise the overheads
> in cache promotion/flushing quickly brings the cluster down to its knees.
> However it looks as though you are mainly doing reads, which means you can
> probably alter your cache settings to not promote so aggressively on reads,
> as reads can be proxied through to the EC tier instead of promoting. This
> should reduce the amount of required cache promotions.
> 
> You are correct that reads have a lower priority of being cached, only when
> they are used very frequently should this be done in an ideal situation.
> 
> 
> Can you try setting min_read_recency_for promote to something higher?
> 
> I looked into the setting before but I must admit it's exact purpose eludes me
> still. Would it be correct to simplify it as 'min_read_recency_for_promote
> determines the amount of times a piece would have to be read in a certain
> interval (set by hit_set_period) in order to promote it to the caching tier' ?

Yes that’s correct. Every hit_set_period (assuming there is IO going on) a new hitset is created up until the hit_set_count limit. The recency defines how many of the last x hitsets an object must have been accessed in. 

Tuning it is a bit of a dark art at the moment as you have to try and get all the values to match your workload. For starters try something like

Read recency =  2 or 3
Hit_set_count =10
Hit_set_period=60

Which will mean if an object is read more than 2 or 3 times in a row within the last few minutes it will be promoted. There is no granularity below a single hitset, so if an object gets hit a 1000 times in 1 minute but then nothing for 5 minutes it will not cause a promotion.

> 
> 
> Also can you check what your hit_set_period and hit_set_count is currently
> set to.
> 
> hit_set_count is set to 1 and hit_set_period to 1800.
> 
> What would increasing the hit_set_count do exactly?
> 
> 
> 
> > Right now I'm unsure how to scientifically test the performance retrieving
> > files when there is a cache miss. If somebody could point me towards a
> > better way of doing that I would appreciate the help.
> >
> > An other thing is that I'm seeing a lot of messages popping up in dmesg on
> > my client server on which the RBD volumes are mounted. (IPs removed)
> >
> > [685881.477383] libceph: osd50 :6800 socket closed (con state OPEN)
> > [685895.597733] libceph: osd54 :6808 socket closed (con state OPEN)
> > [685895.663971] libceph: osd54 :6808 socket closed (con state OPEN)
> > [685895.710424] libceph: osd54 :6808 socket closed (con state OPEN)
> > [685895.749417] libceph: osd54 :6808 socket closed (con state OPEN)
> > [685896.517778] libceph: osd54 :6808 socket closed (con state OPEN)
> > [685906.690445] libceph: osd74 :6824 socket closed (con state OPEN)
> >
> > Is this a symptom of something?
> 
> This is just stale connections to the OSD's timing out after the idle period and
> is nothing to worry about.
> 
> Glad to hear that, I was fearing something might be wrong.
> 
> Thanks again.
> 
> Peter

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com