Hello ceph-users,
I am currently making tests on a small cluster, and Cache Tiering is one
of those tests. The cluster runs Ceph 0.87 Giant on three Ubuntu 14.04
servers with the 3.16.0 kernel, for a total of 8 OSD and 1 MON.
Since there are no SSDs in those servers, I am testing Cache Tiering by
using an erasure-coded pool as storage and a replicated pool as cache.
The cache settings are the "defaults" ones you'll find in the
documentation, and I'm using writeback mode. Also, to simulate the small
size of cache data, the hot storage pool has a 1024MB space quota. Then
I write 4MB chunks of data to the storage pool using 'rados bench' (with
--no-cleanup).
Here are my cache pool settings according to InkScope :
pool 15
pool name test1_ct-cache
auid 0
type 1 (replicated)
size 2
min size 1
crush ruleset 0 (replicated_ruleset)
pg num 512
pg placement_num 512
quota max_bytes 1 GB
quota max_objects 0
flags names hashpspool,incomplete_clones
tiers none
tier of 14 (test1_ec-data)
read tier -1
write tier -1
cache mode writeback
cache target_dirty_ratio_micro 40 %
cache target_full_ratio_micro 80 %
cache min_flush_age 0 s
cache min_evict_age 0 s
target max_objects 0
target max_bytes 960 MB
hit set_count 1
hit set_period 3600 s
hit set_params target_size : 0
seed : 0
type : bloom
false_positive_probability : 0.050000
I believe the tiering itself works well, I do see objects and bytes
being transfered from the cache to the storage when I write data. I
checked with 'rados ls', and the object count in the cold storage is
always right on spot. But it isn't in the cache, when I do 'ceph df' or
'rados df' the space and object counts do not match with 'rados ls', and
are usually much larger :
% ceph df
…
POOLS:
NAME ID USED %USED MAX AVAIL OBJECTS
…
test1_ec-data 14 5576M 0.04 11115G 1394
test1_ct-cache 15 772M 0 7410G 250
% rados -p test1_ec-data ls | wc -l
1394
% rados -p test1_ct-cache ls | wc -l
56
# And this corresponds to 220M of data in test1_ct-cache
Not only it prevents me from knowing exactly what the cache is doing,
but it is also this value that is applied for the quota. And I've seen
writing operations fail because the space count had reached 1G, although
I was quite sure there was enough space. The count does not correct
itself over time, even by waiting overnight. The count only changes when
I "poke" the pool by changing a setting or writing data, but remains
wrong (and not by the same number of objects). The changes in object
counts given by 'rados ls' in both pools match with the number of
objects written by 'rados bench'.
Does anybody know where this mismatch might come from ? Is there a way
to see more details about what's going on ? Or is it the normal behavior
of a cache pool when 'rados bench' is used ?
Thank you in advance for any help.
Regards,
--
Xavier
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com