Hello, On Mon, 11 Jul 2016 16:19:58 +0200 Mateusz Skała wrote: > Hello Cephers. > > Can someone help me in my cache tier configuration? I have 4 same SSD drives > 176GB (184196208K) in SSD pool, how to determine target_max_bytes? What exact SSD models are these? What version of Ceph? > I assume > that should be (4 drives* 188616916992 bytes )/ 3 replica = 251489222656 > bytes *85% (because of full disk warning) In theory correct, but you might want to consider (like with all pools) the impact of loosing a single SSD. In short, backfilling and then the remaining 3 getting full anyway. > It will be 213765839257 bytes ~200GB. I make this little bit lower (160GB) > and after some time whole cluster stops on full disk error. One of SSD > drives are full. I see that use of space at the osd is not equal: > > 32 0.17099 1.00000 175G 127G 49514M 72.47 1.77 95 > > 42 0.17099 1.00000 175G 120G 56154M 68.78 1.68 90 > > 37 0.17099 1.00000 175G 136G 39670M 77.95 1.90 102 > > 47 0.17099 1.00000 175G 130G 46599M 74.09 1.80 97 > What's the exact error message? None of these are over 85 or 95%, how are they full? If the above is a snapshot of when Ceph thinks something is "full", it may be an indication that you've reached target_max_bytes and Ceph simply has no clean (flushed) objects ready to evict. Which means a configuration problem (all ratios, not the defaults, for this pool please) or your cache filling up faster than it can flush. Space is never equal with Ceph, you need a high enough number of PGs for starters and then some fine-tuning. After fiddling with the weights my cache-tier SSD OSDs are all very close to each other: --- ID WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR 18 0.64999 1.00000 679G 543G 136G 79.96 4.35 19 0.67000 1.00000 679G 540G 138G 79.61 4.33 20 0.64999 1.00000 679G 534G 144G 78.70 4.28 21 0.64999 1.00000 679G 536G 142G 79.03 4.30 26 0.62999 1.00000 679G 540G 138G 79.57 4.33 27 0.62000 1.00000 679G 538G 140G 79.30 4.32 28 0.67000 1.00000 679G 539G 140G 79.35 4.32 29 0.69499 1.00000 679G 536G 142G 78.96 4.30 --- > > > My setup: > > ceph --admin-daemon /var/run/ceph/ceph-osd.32.asok config show | grep cache > > Nearly all of these are irrelevant, output of "ceph osd pool ls detail" please, at least for the cache pool. Have you read the documentation and my thread in this ML labeled "Cache tier operation clarifications"? > > Can someone help? Any ideas? It is normal that whole cluster stops at disk > full error on cache tier, I was thinking that only one of pools can stops > and other without cache tier should still work. > Once you activate a cache-tier it becomes for all intends and purposes the the pool it's caching for. So any problem with it will be fatal. Christian -- Christian Balzer Network/Systems Engineer chibi@xxxxxxx Global OnLine Japan/Rakuten Communications http://www.gol.com/ _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com