Re: Cache Tier configuration

Christian Balzer <chibi@xxxxxxx> · Tue, 12 Jul 2016 10:36:53 +0900

Hello,

On Mon, 11 Jul 2016 16:19:58 +0200 Mateusz Skała wrote:

> Hello Cephers.
> 
> Can someone help me in my cache tier configuration? I have 4 same SSD drives
> 176GB (184196208K) in SSD pool, how to determine target_max_bytes? 

What exact SSD models are these?
What version of Ceph?

> I assume
> that should be (4 drives* 188616916992 bytes )/ 3 replica = 251489222656
> bytes *85% (because of full disk warning)

In theory correct, but you might want to consider (like with all pools)
the impact of loosing a single SSD. 
In short, backfilling and then the remaining 3 getting full anyway.

> It will be 213765839257 bytes ~200GB. I make this little bit lower (160GB)
> and after some time whole cluster stops on full disk error. One of SSD
> drives are full. I see that use of space at the osd is not equal:
> 
> 32 0.17099  1.00000   175G   127G 49514M 72.47 1.77  95
> 
> 42 0.17099  1.00000   175G   120G 56154M 68.78 1.68  90
> 
> 37 0.17099  1.00000   175G   136G 39670M 77.95 1.90 102
> 
> 47 0.17099  1.00000   175G   130G 46599M 74.09 1.80  97
> 

What's the exact error message?

None of these are over 85 or 95%, how are they full?

If the above is a snapshot of when Ceph thinks something is "full", it may
be an indication that you've reached target_max_bytes and Ceph simply has
no clean (flushed) objects ready to evict.
Which means a configuration problem (all ratios, not the defaults, for
this pool please) or your cache filling up faster than it can flush.

Space is never equal with Ceph, you need a high enough number of PGs for
starters and then some fine-tuning.

After fiddling with the weights my cache-tier SSD OSDs are all very close
to each other:
---
ID WEIGHT  REWEIGHT SIZE  USE    AVAIL  %USE  VAR  
18 0.64999  1.00000  679G   543G   136G 79.96 4.35 
19 0.67000  1.00000  679G   540G   138G 79.61 4.33 
20 0.64999  1.00000  679G   534G   144G 78.70 4.28 
21 0.64999  1.00000  679G   536G   142G 79.03 4.30 
26 0.62999  1.00000  679G   540G   138G 79.57 4.33 
27 0.62000  1.00000  679G   538G   140G 79.30 4.32 
28 0.67000  1.00000  679G   539G   140G 79.35 4.32 
29 0.69499  1.00000  679G   536G   142G 78.96 4.30 
---

>  
> 
> My setup:
> 
> ceph --admin-daemon /var/run/ceph/ceph-osd.32.asok config show | grep cache
> 
>   
Nearly all of these are irrelevant, output of "ceph osd pool ls detail"
please, at least for the cache pool.

Have you read the documentation and my thread in this ML labeled 
"Cache tier operation clarifications"?

> 
> Can someone help? Any ideas? It is normal that whole cluster stops at disk
> full error on cache tier, I was thinking that only one of pools can stops
> and other without cache tier should still work.
>
Once you activate a cache-tier it becomes for all intends and purposes the
the pool it's caching for.
So any problem with it will be fatal.

Christian
-- 
Christian Balzer        Network/Systems Engineer                
chibi@xxxxxxx   	Global OnLine Japan/Rakuten Communications
http://www.gol.com/
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com