Re: strange cache tier behaviour with cephfs

Oliver Dzombic <info@xxxxxxxxxxxxxxxxx> · Tue, 14 Jun 2016 01:57:49 +0200

Hi,

yeah, for sure you are right !

Sorry...

Its Centos 7 ( default kernel 3.10 )
ceph version 10.2.1 (3a66dd4f30852819c1bdaa8ec23c795d4ad77269)

----

In the very end i want/need a setup where all is going to the cache tier
( read and write ).

Writes shall be, after some time without modification, written to the
cold pool.

Read shall be, as long as they are frequently red, remain on the cache pool.

To me, writeback mode should do that. But somehow does not.

Since my thread opening mail 1,5 houres has been passed by.

And its still not evicted. So i still have this objects inside the cache
pool, while after 3600 seconds they should be gone.

So there is some major problem here, as it seems to me.

-- 
Mit freundlichen Gruessen / Best regards

Oliver Dzombic
IP-Interactive

mailto:info@xxxxxxxxxxxxxxxxx

Anschrift:

IP Interactive UG ( haftungsbeschraenkt )
Zum Sonnenberg 1-3
63571 Gelnhausen

HRB 93402 beim Amtsgericht Hanau
Geschäftsführung: Oliver Dzombic

Steuer Nr.: 35 236 3622 1
UST ID: DE274086107

Am 14.06.2016 um 01:52 schrieb Samuel Just:
> I'd have to look more closely, but these days promotion is
> probabilistic and throttled.  During each read of those objects, it
> will tend to promote a few more of them depending on how many
> promotions are in progress and how hot it thinks a particular object
> is.  The lack of a speed up is a bummer, but I guess you aren't
> limited by the disk throughput here for some reason.  Writes can also
> be passed directly to the backing tier depending on similar factors.
> 
> It's usually helpful to include the version you are running.
> -Sam
> 
> On Mon, Jun 13, 2016 at 3:37 PM, Oliver Dzombic <info@xxxxxxxxxxxxxxxxx> wrote:
>> Hi,
>>
>> i am for sure not really experienced yet with ceph or with cache tier,
>> but to me it seems to behave strange.
>>
>> Setup:
>>
>> pool 3 'ssd_cache' replicated size 2 min_size 1 crush_ruleset 1
>> object_hash rjenkins pg_num 1024 pgp_num 1024 last_change 190 flags
>> hashpspool,incomplete_clones tier_of 4 cache_mode writeback target_bytes
>> 800000000000 hit_set bloom{false_positive_probability: 0.05,
>> target_size: 0, seed: 0} 3600s x1 decay_rate 0 search_last_n 0
>> stripe_width 0
>>
>> pool 4 'cephfs_data' replicated size 2 min_size 1 crush_ruleset 2
>> object_hash rjenkins pg_num 1024 pgp_num 1024 last_change 169 lfor 144
>> flags hashpspool crash_replay_interval 45 tiers 3 read_tier 3 write_tier
>> 3 stripe_width 0
>>
>> pool 5 'cephfs_metadata' replicated size 2 min_size 1 crush_ruleset 1
>> object_hash rjenkins pg_num 128 pgp_num 128 last_change 191 flags
>> hashpspool stripe_width 0
>>
>> hit_set_count: 1
>> hit_set_period: 120
>> target_max_bytes: 800000000000
>> min_read_recency_for_promote: 0
>> min_write_recency_for_promote: 0
>> target_max_objects: 0
>> cache_target_dirty_ratio: 0.5
>> cache_target_dirty_high_ratio: 0.8
>> cache_target_full_ratio: 0.9
>> cache_min_flush_age: 1800
>> cache_min_evict_age: 3600
>>
>> rule ssd-cache-rule {
>>         ruleset 1
>>         type replicated
>>         min_size 2
>>         max_size 10
>>         step take ssd-cache
>>         step chooseleaf firstn 0 type host
>>         step emit
>> }
>>
>>
>> rule cold-storage-rule {
>>         ruleset 2
>>         type replicated
>>         min_size 2
>>         max_size 10
>>         step take cold-storage
>>         step chooseleaf firstn 0 type host
>>         step emit
>> }
>>
>>
>>
>> [root@cephmon1 ceph-cluster-gen2]# rados -p ssd_cache ls
>> [root@cephmon1 ceph-cluster-gen2]#
>> -> empty
>>
>> Now, on a cephfs mounted client i have files.
>>
>> Read operation:
>>
>> dd if=testfile of=/dev/zero
>>
>> 1494286336 bytes (1.5 GB) copied, 11.047 s, 135 MB/s
>>
>>
>> [root@cephosd1 ~]# rados -p ssd_cache ls
>> 1000000001e.00000010
>> 1000000001e.00000004
>> 1000000001e.00000001
>> 1000000001e.0000000c
>> 1000000001e.00000008
>> 1000000001e.00000003
>> 1000000001e.00000000
>> 1000000001e.00000002
>>
>> Running this multiple times after one another, does not change the
>> content. Its always the same objects.
>>
>> -------------
>>
>> Ok, so according to the documents, writeback mode, it moved from cold
>> storeage to hot storage ( cephfs_data to ssd_cache in my case ).
>>
>>
>> Now i repeat it:
>>
>> dd if=testfile of=/dev/zero
>>
>> 1494286336 bytes (1.5 GB) copied, 11.311 s, 132 MB/s
>>
>>
>> [root@cephosd1 ~]# rados -p ssd_cache ls
>> 1000000001e.00000010
>> 1000000001e.00000004
>> 1000000001e.00000001
>> 1000000001e.0000000c
>> 1000000001e.0000000d
>> 1000000001e.00000005
>> 1000000001e.00000008
>> 1000000001e.00000015
>> 1000000001e.00000011
>> 1000000001e.00000006
>> 1000000001e.00000003
>> 1000000001e.00000009
>> 1000000001e.00000000
>> 1000000001e.0000000a
>> 1000000001e.0000001b
>> 1000000001e.00000002
>>
>>
>> So why are there now the old objects ( 8 ) plus another 8 objects ?
>>
>> Repeating this, will extend the numbers of objects endless without
>> speeding up the dd. in the ssd_cache.
>>
>> So every new dd read, of exact the same file ( to me that means, same
>> PGs/objects ) the (same) data is copied from cold pool to cache pool.
>>
>> And from there pushed to the client ( without any speed gain ).
>>
>> And thats not supposed to happen ( according to the documentation with
>> writeback cache mode ).
>>
>> Similar happens when i am writing.
>>
>> If i write, it will store the data on cold pool and cache pool equally.
>>
>> For my understanding, with my configuration, at least 1800 seconds (
>> cache_min_flush_age ) should pass by before the agent starts to flush
>> from the cache pool to the cold pool.
>>
>> But it does not.
>>
>> So, is there something specific with cephfs, or is my config just too
>> much crappy and i have no idea what i am doing here ?
>>
>> Anything is highly welcome !
>>
>> Thank you !
>>
>>
>> --
>> Mit freundlichen Gruessen / Best regards
>>
>> Oliver Dzombic
>> IP-Interactive
>>
>> mailto:info@xxxxxxxxxxxxxxxxx
>>
>> Anschrift:
>>
>> IP Interactive UG ( haftungsbeschraenkt )
>> Zum Sonnenberg 1-3
>> 63571 Gelnhausen
>>
>> HRB 93402 beim Amtsgericht Hanau
>> Geschäftsführung: Oliver Dzombic
>>
>> Steuer Nr.: 35 236 3622 1
>> UST ID: DE274086107
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com