On 06/02/2015 07:21 PM, Paul Evans wrote:
Kenneth,
My guess is that you’re hitting the cache_target_full_ratio on an
individual OSD, which is easy to do since most of us tend to think of
the cache_target_full_ratio as an aggregate of the OSDs (which it is not
according to Greg Farnum). This posting may shed more light on the
issue, if it is indeed what you are bumping up against.
https://www.mail-archive.com/ceph-users%40lists.ceph.com/msg20207.html
It looks like this indeed, then the question is why it is not flushing more?
BTW: how are you determining that your OSDs are ‘not overloaded?’
Are you judging that by iostat utilization, or by capacity consumed?
iostat is showing low utilisation on all disks; soem disks are doing
'nothing':
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s
avgrq-sz avgqu-sz await r_await w_await svctm %util
sdn 0.00 0.00 813.50 415.00 16.90 15.49
53.99 0.42 0.35 0.15 0.72 0.10 12.00
sdm 0.00 0.00 820.50 490.50 13.06 21.99
54.76 0.70 0.54 0.18 1.13 0.12 15.50
sdq 0.00 1.50 14.00 47.00 0.98 0.33
43.99 0.55 8.93 18.93 5.96 6.31 38.50
sdr 0.00 0.00 0.00 0.50 0.00 0.00
14.00 0.00 0.00 0.00 0.00 0.00 0.00
sdd 0.00 9.50 4.00 21.50 0.27 1.47
140.00 0.12 4.71 2.50 5.12 4.31 11.00
sda 0.00 8.50 2.50 14.50 0.26 0.71
116.91 0.08 4.41 4.00 4.48 4.71 8.00
sdh 0.00 6.00 2.00 15.00 0.25 1.10
162.59 0.07 3.82 7.50 3.33 3.53 6.00
sdf 0.00 17.50 3.00 25.00 0.32 1.01
97.48 0.23 8.21 5.00 8.60 8.21 23.00
sdi 0.00 11.00 1.00 31.50 0.07 2.23
144.60 0.14 4.46 0.00 4.60 3.85 12.50
sdo 0.00 0.00 0.00 1.00 0.00 0.00
8.00 0.00 0.00 0.00 0.00 0.00 0.00
sdk 0.00 0.00 22.50 0.00 1.58 0.00
143.82 0.13 5.78 5.78 0.00 4.00 9.00
sdg 0.00 2.50 0.00 30.00 0.00 3.35
228.52 0.14 4.50 0.00 4.50 1.33 4.00
sdc 0.00 12.50 1.50 23.50 0.01 1.36
111.68 0.17 6.80 0.00 7.23 6.20 15.50
sdj 0.00 18.50 27.50 30.50 2.28 1.65
138.82 0.43 7.33 7.82 6.89 5.86 34.00
sde 0.00 4.00 0.50 15.00 0.04 0.10
18.10 0.07 4.84 10.00 4.67 2.58 4.00
sdl 0.00 23.00 6.00 33.00 0.58 1.31
99.22 0.28 7.05 17.50 5.15 6.79 26.50
sdb 0.00 5.00 3.00 9.00 0.12 0.47
100.29 0.05 4.58 1.67 5.56 3.75 4.50
In my opinion there should be enough resources to do flushing, and
therefore not getting a full cache..
--
*Paul *
*
*
*
*
On Jun 2, 2015, at 9:53 AM, Kenneth Waegeman
<kenneth.waegeman@xxxxxxxx <mailto:kenneth.waegeman@xxxxxxxx>> wrote:
Hi,
we were rsync-streaming with 4 cephfs client to a ceph cluster with a
cache layer upon an erasure coded pool.
This was going on for some time, and didn't have real problems.
Today we added 2 more streams, and very soon we saw some strange
behaviour:
- We are getting blocked requests on our cache pool osds
- our cache pool is often near/ at max ratio
- Our data streams have very bursty IO, (streaming a minute a few
hunderds MB and then nothing)
Our OSDs are not overloaded (nor the ECs nor cache, checked with
iostat), though it seems like the cache pool can not evict objects in
time, and get blocked until that is ok, each time again.
If I rise the target_max_bytes limit, it starts streaming again until
it is full again.
cache parameters we have are these:
ceph osd pool set cache hit_set_type bloom
ceph osd pool set cache hit_set_count 1
ceph osd pool set cache hit_set_period 3600
ceph osd pool set cache target_max_bytes $((14*75*1024*1024*1024))
ceph osd pool set cache cache_target_dirty_ratio 0.4
ceph osd pool set cache cache_target_full_ratio 0.8
What can be the issue here ? I tried to find some information about
the 'cache agent' , but can only find some old references..
Thank you!
Kenneth
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com