Cache Tier configuration

Mateusz Skała <mateusz.skala@xxxxxxxxxxx> · Mon, 11 Jul 2016 16:19:58 +0200



Hello Cephers.
Can someone help me in my cache tier configuration? I have 4 same SSD drives 176GB (184196208K) in SSD pool, how to determine target_max_bytes? I assume that should be (4 drives* 188616916992 bytes )/ 3 replica = 251489222656 bytes *85% (because of full disk warning)
It will be 213765839257 bytes ~200GB. I make this little bit lower (160GB) and after some time whole cluster stops on full disk error. One of SSD drives are full. I see that use of space at the osd is not equal:
32 0.17099  1.00000   175G   127G 49514M 72.47 1.77  95
42 0.17099  1.00000   175G   120G 56154M 68.78 1.68  90
37 0.17099  1.00000   175G   136G 39670M 77.95 1.90 102
47 0.17099  1.00000   175G   130G 46599M 74.09 1.80  97
 
My setup:
ceph --admin-daemon /var/run/ceph/ceph-osd.32.asok config show | grep cache
  
  "debug_objectcacher": "0\/5",
    "mon_osd_cache_size": "10",
    "mon_cache_target_full_warn_ratio": "0.66",
    "mon_warn_on_cache_pools_without_hit_sets": "true",
    "client_cache_size": "16384",
    "client_cache_mid": "0.75",
    "mds_cache_size": "100000",
    "mds_cache_mid": "0.7",
    "mds_dump_cache_on_map": "false",
    "mds_dump_cache_after_rejoin": "false",
    "osd_pool_default_cache_target_dirty_ratio": "0.4",
    "osd_pool_default_cache_target_dirty_high_ratio": "0.6",
    "osd_pool_default_cache_target_full_ratio": "0.8",
    "osd_pool_default_cache_min_flush_age": "0",
    "osd_pool_default_cache_min_evict_age": "0",
    "osd_tier_default_cache_mode": "writeback",
    "osd_tier_default_cache_hit_set_count": "4",
    "osd_tier_default_cache_hit_set_period": "1200",
    "osd_tier_default_cache_hit_set_type": "bloom",
    "osd_tier_default_cache_min_read_recency_for_promote": "3",
    "osd_tier_default_cache_min_write_recency_for_promote": "3",
    "osd_map_cache_size": "200",
    "osd_pg_object_context_cache_count": "64",
    "leveldb_cache_size": "134217728",
    "filestore_omap_header_cache_size": "1024",
    "filestore_fd_cache_size": "128",
    "filestore_fd_cache_shards": "16",
    "keyvaluestore_header_cache_size": "4096",
    "rbd_cache": "true",
    "rbd_cache_writethrough_until_flush": "true",
    "rbd_cache_size": "33554432",
    "rbd_cache_max_dirty": "25165824",
    "rbd_cache_target_dirty": "16777216",
    "rbd_cache_max_dirty_age": "1",
    "rbd_cache_max_dirty_object": "0",
    "rbd_cache_block_writes_upfront": "false",
    "rgw_cache_enabled": "true",
    "rgw_cache_lru_size": "10000",
    "rgw_keystone_token_cache_size": "10000",
    "rgw_bucket_quota_cache_size": "10000",
 
 
Rule for SSD:
rule ssd {
        ruleset 1
        type replicated
        min_size 1
        max_size 10
        step take ssd
        step choose firstn 2 type rack
        step chooseleaf firstn 2 type host
        step emit
        step take ssd
        step chooseleaf firstn -2 type osd
        step emit
}
 
OSD tree with SSD:
-8  0.68597 root ssd
 -9  0.34299     rack skwer-ssd
-16  0.17099         host ceph40-ssd
 32  0.17099             osd.32            up  1.00000          1.00000
-19  0.17099         host ceph50-ssd
 42  0.17099             osd.42            up  1.00000          1.00000
-11  0.34299     rack nzoz-ssd
-17  0.17099         host ceph45-ssd
 37  0.17099             osd.37            up  1.00000          1.00000
-22  0.17099         host ceph55-ssd
 47  0.17099             osd.47            up  1.00000          1.00000
 
Can someone help? Any ideas? It is normal that whole cluster stops at disk full error on cache tier, I was thinking that only one of pools can stops and other without cache tier should still work.
Best regards,
-- 
Mateusz Skała
mateusz.skala@xxxxxxxxxxx
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com