On Tue, Feb 24, 2015 at 6:21 AM, Xavier Villaneau <xavier.villaneau@xxxxxxxxxxxx> wrote: > Hello ceph-users, > > I am currently making tests on a small cluster, and Cache Tiering is one of > those tests. The cluster runs Ceph 0.87 Giant on three Ubuntu 14.04 servers > with the 3.16.0 kernel, for a total of 8 OSD and 1 MON. > > Since there are no SSDs in those servers, I am testing Cache Tiering by > using an erasure-coded pool as storage and a replicated pool as cache. The > cache settings are the "defaults" ones you'll find in the documentation, and > I'm using writeback mode. Also, to simulate the small size of cache data, > the hot storage pool has a 1024MB space quota. Then I write 4MB chunks of > data to the storage pool using 'rados bench' (with --no-cleanup). > > Here are my cache pool settings according to InkScope : > pool 15 > pool name test1_ct-cache > auid 0 > type 1 (replicated) > size 2 > min size 1 > crush ruleset 0 (replicated_ruleset) > pg num 512 > pg placement_num 512 > quota max_bytes 1 GB > quota max_objects 0 > flags names hashpspool,incomplete_clones > tiers none > tier of 14 (test1_ec-data) > read tier -1 > write tier -1 > cache mode writeback > cache target_dirty_ratio_micro 40 % > cache target_full_ratio_micro 80 % > cache min_flush_age 0 s > cache min_evict_age 0 s > target max_objects 0 > target max_bytes 960 MB > hit set_count 1 > hit set_period 3600 s > hit set_params target_size : 0 > seed : 0 > type : bloom > false_positive_probability : 0.050000 > > I believe the tiering itself works well, I do see objects and bytes being > transfered from the cache to the storage when I write data. I checked with > 'rados ls', and the object count in the cold storage is always right on > spot. But it isn't in the cache, when I do 'ceph df' or 'rados df' the space > and object counts do not match with 'rados ls', and are usually much larger > : > > % ceph df > … > POOLS: > NAME ID USED %USED MAX AVAIL OBJECTS > … > test1_ec-data 14 5576M 0.04 11115G 1394 > test1_ct-cache 15 772M 0 7410G 250 > % rados -p test1_ec-data ls | wc -l > 1394 > % rados -p test1_ct-cache ls | wc -l > 56 > # And this corresponds to 220M of data in test1_ct-cache > > Not only it prevents me from knowing exactly what the cache is doing, but it > is also this value that is applied for the quota. And I've seen writing > operations fail because the space count had reached 1G, although I was quite > sure there was enough space. The count does not correct itself over time, > even by waiting overnight. The count only changes when I "poke" the pool by > changing a setting or writing data, but remains wrong (and not by the same > number of objects). The changes in object counts given by 'rados ls' in both > pools match with the number of objects written by 'rados bench'. > > Does anybody know where this mismatch might come from ? Is there a way to > see more details about what's going on ? Or is it the normal behavior of a > cache pool when 'rados bench' is used ? Well, I don't think the quota stuff is going to interact well with caching pools; the size limits are implemented at different places in the cache. Similarly, rados ls definitely doesn't work properly on cache pools; you shouldn't expect anything sensible to come out of it. Among other things, there are "whiteout" objects in the cache pool (recording that an object is known not to exist in the base pool) that won't be listed in "rados ls", and I'm sure there's other stuff too. If you're trying to limit the cache pool size you want to do that with the target size and dirty targets/limits. -Greg _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com