Re: Erasure pool performance expectations

Mark Nelson <mnelson@xxxxxxxxxx> · Tue, 3 May 2016 09:05:02 -0500

In addition to what nick said, it's really valuable to watch your cache 
tier write behavior during heavy IO.  One thing I noticed is you said 
you have 2 SSDs for journals and 7 SSDs for data.  If they are all of 
the same type, you're likely bottlenecked by the journal SSDs for 
writes, which compounded with the heavy promotions is going to really 
hold you back.

What you really want:

1) (assuming filestore) equal large write throughput between the 
journals and data disks.

2) promotions to be limited by some reasonable fraction of the cache 
tier and/or network throughput (say 70%).  This is why the 
user-configurable promotion throttles were added in jewel.

3) The cache tier to fill up quickly when empty but change slowly once 
it's full (ie limiting promotions and evictions).  No real way to do 
this yet.

Mark

On 05/03/2016 08:40 AM, Peter Kerdisle wrote:
Thank you, I will attempt to play around with these settings and see if
I can achieve better read performance.

Appreciate your insights.

Peter

On Tue, May 3, 2016 at 3:00 PM, Nick Fisk <nick@xxxxxxxxxx
<mailto:nick@xxxxxxxxxx>> wrote:

    > -----Original Message-----
    > From: Peter Kerdisle [mailto:peter.kerdisle@xxxxxxxxx
    <mailto:peter.kerdisle@xxxxxxxxx>]
    > Sent: 03 May 2016 12:15
    > To: nick@xxxxxxxxxx <mailto:nick@xxxxxxxxxx>
    > Cc: ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>
    > Subject: Re:  Erasure pool performance expectations
    >
    > Hey Nick,
    >
    > Thanks for taking the time to answer my questions. Some in-line
    comments.
    >
    > On Tue, May 3, 2016 at 10:51 AM, Nick Fisk <nick@xxxxxxxxxx
    <mailto:nick@xxxxxxxxxx>> wrote:
    > Hi Peter,
    >
    >
    > > -----Original Message-----
    > > From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx
    <mailto:ceph-users-bounces@xxxxxxxxxxxxxx>] On Behalf
    > Of
    > > Peter Kerdisle
    > > Sent: 02 May 2016 08:17
    > > To: ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>
    > > Subject:  Erasure pool performance expectations
    > >
    > > Hi guys,
    > >
    > > I am currently testing the performance of RBD using a cache pool
    and a 4/2
    > > erasure profile pool.
    > >
    > > I have two SSD cache servers (2 SSDs for journals, 7 SSDs for
    data) with
    > > 2x10Gbit bonded each and a six OSD nodes with a 10Gbit public
    and 10Gbit
    > > cluster network for the erasure pool (10x3TB without separate
    journal).
    > This
    > > is all on Jewel.
    > >
    > > What I would like to know is if the performance I'm seeing is to be
    > expected
    > > and if there is some way to test this in a more qualifiable way.
    > >
    > > Everything works as expected if the files are present on the
    cache pool,
    > > however when things need to be retrieved from the cache pool I see
    > > performance degradation. I'm trying to simulate real usage as
    much as
    > > possible and trying to retrieve files from the RBD volume over
    FTP from a
    > > client server. What I'm seeing is that the FTP transfer will
    stall for seconds
    > at a
    > > time and then get some more data which results in an average
    speed of
    > > 200KB/s. From the cache this is closer to 10MB/s. Is this the
    expected
    > > behaviour from a erasure coded tier with cache in front?
    >
    > Unfortunately yes. The whole Erasure/Cache thing only really works
    well if
    > the data in the EC tier is only accessed infrequently, otherwise
    the overheads
    > in cache promotion/flushing quickly brings the cluster down to its
    knees.
    > However it looks as though you are mainly doing reads, which means
    you can
    > probably alter your cache settings to not promote so aggressively
    on reads,
    > as reads can be proxied through to the EC tier instead of
    promoting. This
    > should reduce the amount of required cache promotions.
    >
    > You are correct that reads have a lower priority of being cached,
    only when
    > they are used very frequently should this be done in an ideal
    situation.
    >
    >
    > Can you try setting min_read_recency_for promote to something higher?
    >
    > I looked into the setting before but I must admit it's exact
    purpose eludes me
    > still. Would it be correct to simplify it as
    'min_read_recency_for_promote
    > determines the amount of times a piece would have to be read in a
    certain
    > interval (set by hit_set_period) in order to promote it to the
    caching tier' ?

    Yes that’s correct. Every hit_set_period (assuming there is IO going
    on) a new hitset is created up until the hit_set_count limit. The
    recency defines how many of the last x hitsets an object must have
    been accessed in.

    Tuning it is a bit of a dark art at the moment as you have to try
    and get all the values to match your workload. For starters try
    something like

    Read recency =  2 or 3
    Hit_set_count =10
    Hit_set_period=60

    Which will mean if an object is read more than 2 or 3 times in a row
    within the last few minutes it will be promoted. There is no
    granularity below a single hitset, so if an object gets hit a 1000
    times in 1 minute but then nothing for 5 minutes it will not cause a
    promotion.

    >
    >
    > Also can you check what your hit_set_period and hit_set_count is
    currently
    > set to.
    >
    > hit_set_count is set to 1 and hit_set_period to 1800.
    >
    > What would increasing the hit_set_count do exactly?
    >
    >
    >
    > > Right now I'm unsure how to scientifically test the performance
    retrieving
    > > files when there is a cache miss. If somebody could point me
    towards a
    > > better way of doing that I would appreciate the help.
    > >
    > > An other thing is that I'm seeing a lot of messages popping up
    in dmesg on
    > > my client server on which the RBD volumes are mounted. (IPs removed)
    > >
    > > [685881.477383] libceph: osd50 :6800 socket closed (con state OPEN)
    > > [685895.597733] libceph: osd54 :6808 socket closed (con state OPEN)
    > > [685895.663971] libceph: osd54 :6808 socket closed (con state OPEN)
    > > [685895.710424] libceph: osd54 :6808 socket closed (con state OPEN)
    > > [685895.749417] libceph: osd54 :6808 socket closed (con state OPEN)
    > > [685896.517778] libceph: osd54 :6808 socket closed (con state OPEN)
    > > [685906.690445] libceph: osd74 :6824 socket closed (con state OPEN)
    > >
    > > Is this a symptom of something?
    >
    > This is just stale connections to the OSD's timing out after the
    idle period and
    > is nothing to worry about.
    >
    > Glad to hear that, I was fearing something might be wrong.
    >
    > Thanks again.
    >
    > Peter

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com