Re: is there any way to speed up cache evicting?

David Turner <drakonstein@xxxxxxxxx> · Fri, 02 Jun 2017 13:41:24 +0000

I'm thinking you have erasure coding in cephfs and only use cache tiring because you have to, correct? What is your use case for repeated file accesses? How much data is written into cephfs at a time?
For me, my files are infrequently accessed after they are written or read from the EC back-end pool.  I set my cache pool to never leave any data in it older than an hour. I have a buddy with a similar setup with the difference that files added to cephfs will be heavily accessed and modified for the first day, but then intently accessed. He has his settings to keep all of his accessed files in cache for 24 hours before they are all cleaned up.
We do that by having a target_max_ratio of 0.0 and min_evict and min_flush ages set appropriately. The cluster is never chosing what to flush or evict based on maintaining the full ratio. As soon as the minimum age are met, it does it is added to the queue to process.
This fixed our cluster speeds during times when the cache pool was cleaning up. The problem we hypothesized was that it was the prices of choosing what to clean vs what to keep was causing.

On Fri, Jun 2, 2017, 4:54 AM jiajia zhong <zhong2plus@xxxxxxxxx> wrote:
thank you for your guide :)， It's making sense.

2017-06-02 16:17 GMT+08:00 Christian Balzer <chibi@xxxxxxx>:

Hello,

On Fri, 2 Jun 2017 14:30:56 +0800 jiajia zhong wrote:

> christian, thanks for your reply.

>

> 2017-06-02 11:39 GMT+08:00 Christian Balzer <chibi@xxxxxxx>:

>

> > On Fri, 2 Jun 2017 10:30:46 +0800 jiajia zhong wrote:

> >

> > > hi guys:

> > >

> > > Our ceph cluster is working with tier cache.

> > If so, then I suppose you read all the discussions here as well and not

> > only the somewhat lacking documentation?

> >

> > > I am running "rados -p data_cache cache-try-flush-evict-all" to evict all

> > > the objects.

> > Why?

> > And why all of it?

>

>

> we found that when the threshold(flush/evict) was triggered, the

> performance would make us a bit upset :), so I wish to flush/evict the tier

> in a spare time,eg, middle night，In this scenario，the tier could not pay

> any focus on flush/evict while the great w/r operations on cephfs which we

> are using.

>

As I said, eviction (which is basically zeroing the data in cache) has

very little impact.

Flushing, moving data from the cache tier to the main pool is another

story.

But what you're doing here is completely invalidating your cache (the

eviction part), so the performance will be very bad after this as well.

If you have low utilization periods, consider a cron job that lowers the

dirty ratio (causing only flushes to happen) and then after a while (few

minutes should do, experiment) restore the old setting.

For example:

---

# Preemptive Flushing before midnight

45 23 * * * root ceph osd pool set cache cache_target_dirty_ratio 0.52

# And back to normal levels

55 23 * * * root ceph osd pool set cache cache_target_dirty_ratio 0.60

---

This will of course only help if the amount of data promoted into your

cache per day is small enough to fit into the flushed space.

Otherwise your cluster has no other choice to start flushing when things

get full.

Christian

> >

> > > But It a bit slow

> > >

> > Define slow, but it has to do a LOT of work and housekeeping to do this,

> > so unless your cluster is very fast (probably not, or you wouldn't

> > want/need a cache tier) and idle, that's the way it is.

> >

> > > 1. Is there any way to speed up the evicting?

> > >

> > Not really, see above.

> >

> > > 2. Is evicting triggered by itself good enough for cluster ?

> > >

> > See above, WHY are you manually flushing/evicting?

> >

> explained above.

>

>

> > Are you aware that flushing is the part that's very I/O intensive, while

> > evicting is a very low cost/impact operation?

> >

> not very sure, my instinct believed those.

>

>

> > In normal production, the various parameters that control this will do

> > fine, if properly configured of course.

> >

> > > 3. Does the flushing and evicting slow down the whole cluster?

> > >

> > Of course, as any good sysadmin with the correct tools (atop, iostat,

> > etc, graphing Ceph performance values with Grafana/Graphite) will be able

> > to see instantly.

>

> actually, we are using graphite,  but I could not see that instantly, lol

> :(, I could only got the threshold triggered by calculating after happening.

>

> btw, we have cephfs to store a huge number of small files, (64T , about

> 100K per file),

>

>

> >

> >

> > Christian

> > --

> > Christian Balzer        Network/Systems Engineer

> > chibi@xxxxxxx           Rakuten Communications

> >

--

Christian Balzer        Network/Systems Engineer

chibi@xxxxxxx           Rakuten Communications

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com