Hello, > -----Original Message----- > From: Christian Balzer [mailto:chibi@xxxxxxx] > Sent: Wednesday, July 13, 2016 4:03 AM > To: ceph-users@xxxxxxxxxxxxxx > Cc: Mateusz Skała <mateusz.skala@xxxxxxxxxxx> > Subject: Re: Cache Tier configuration > > > Hello, > > On Tue, 12 Jul 2016 11:01:30 +0200 Mateusz Skała wrote: > > > Thank You for replay. Answers below. > > > > > -----Original Message----- > > > From: Christian Balzer [mailto:chibi@xxxxxxx] > > > Sent: Tuesday, July 12, 2016 3:37 AM > > > To: ceph-users@xxxxxxxxxxxxxx > > > Cc: Mateusz Skała <mateusz.skala@xxxxxxxxxxx> > > > Subject: Re: Cache Tier configuration > > > > > > > > > Hello, > > > > > > On Mon, 11 Jul 2016 16:19:58 +0200 Mateusz Skała wrote: > > > > > > > Hello Cephers. > > > > > > > > Can someone help me in my cache tier configuration? I have 4 > > > > same SSD drives 176GB (184196208K) in SSD pool, how to determine > > > target_max_bytes? > > > > > > What exact SSD models are these? > > > What version of Ceph? > > > > Intel DC S3610 (SSDSC2BX200G401), ceph version 9.2.1 > > (752b6a3020c3de74e07d2a8b4c5e48dab5a6b6fd) > > > > Good, these are decent SSDs and at 3DWPD probably durable enough, too. > You will want to monitor their wear-out level anyway, though. > > Remember, dead cache pool means unaccessible and/or lost data. > > Jewel has improved cache controls and a different, less aggressive > default behavior, you may want to consider upgrading to it, especially > if you don't want to become a cache tiering specialist. ^o^ > > Also Infernalis is no longer receiving updates. We are planning upgrade in first week of August. > > > > I assume > > > > that should be (4 drives* 188616916992 bytes )/ 3 replica = > > > > 251489222656 bytes *85% (because of full disk warning) > > > > > > In theory correct, but you might want to consider (like with all > > > pools) the impact of loosing a single SSD. > > > In short, backfilling and then the remaining 3 getting full anyway. > > > > > > > OK, so better to make lower max target bates than I have space? For > example 170GB? Then I will have 1 osd reserve. > > > Something like this, though failures with these SSDs are very unlikely. > > > > > It will be 213765839257 bytes ~200GB. I make this little bit > > > > lower > > > > (160GB) and after some time whole cluster stops on full disk error. > > > > One of SSD drives are full. I see that use of space at the osd is not equal: > > > > > > > > 32 0.17099 1.00000 175G 127G 49514M 72.47 1.77 95 > > > > > > > > 42 0.17099 1.00000 175G 120G 56154M 68.78 1.68 90 > > > > > > > > 37 0.17099 1.00000 175G 136G 39670M 77.95 1.90 102 > > > > > > > > 47 0.17099 1.00000 175G 130G 46599M 74.09 1.80 97 > > > > > > > > > > What's the exact error message? > > > > > > None of these are over 85 or 95%, how are they full? > > > > Osd.37 was full on 96%, after error (heath ERR, 1 full osd).Then I > > set > max_target_bytes on 100GB. Flushing reduced used space, now cluster is > working ok, but I want to clarify my configuration. > > > Don't get flushing (copying dirty objects to the backing pool) and > eviction (deleting, really zero-ing, clean objects). > Eviction is what frees up space, but it needs flushed (clean) objects > to work with. > OK, I understand that evicting frees space? > > > > > > > > If the above is a snapshot of when Ceph thinks something is > > > "full", it may be an indication that you've reached > > > target_max_bytes and Ceph simply has no clean (flushed) objects ready to evict. > > > Which means a configuration problem (all ratios, not the defaults, > > > for this pool please) or your cache filling up faster than it can flush. > > > > > Above snapshot is at this time, when cluster Is working OK. Filling > > faster than flushing is very possible, when the error become I have > > in config min 'promote' set at 1, like this > > > > "osd_tier_default_cache_min_read_recency_for_promote": "1", > > "osd_tier_default_cache_min_write_recency_for_promote": "1", > > > > Now I changed this to 3, and looks like is working, 3 days without > > near full > osd. > > > There are a number of other options to control things, especially with Jewel. > Also setting your cache mode to readforward might be a good idea > depending on your use case. > I'm considering this move, especially we are also using SSD Journal. Please confirm, can I use cache tire readforward with pool size 1? It is safe? Then I will have 3 times more space for cache tier. > > > Space is never equal with Ceph, you need a high enough number of > > > PGs for starters and then some fine-tuning. > > > > > > After fiddling with the weights my cache-tier SSD OSDs are all > > > very close to each other: > > > --- > > > ID WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR > > > 18 0.64999 1.00000 679G 543G 136G 79.96 4.35 > > > 19 0.67000 1.00000 679G 540G 138G 79.61 4.33 > > > 20 0.64999 1.00000 679G 534G 144G 78.70 4.28 > > > 21 0.64999 1.00000 679G 536G 142G 79.03 4.30 > > > 26 0.62999 1.00000 679G 540G 138G 79.57 4.33 > > > 27 0.62000 1.00000 679G 538G 140G 79.30 4.32 > > > 28 0.67000 1.00000 679G 539G 140G 79.35 4.32 > > > 29 0.69499 1.00000 679G 536G 142G 78.96 4.30 > > > --- > > In Your snapshot used space is near equal, only 1% difference, I > > have near > 10% differences in used space. It depends on number of PG, or maybe > weight? > > > As I wrote, both. > 10% suggests that you probably already have enough PGs, time to > fine-tune the weights, see the differences in my list above. > I will check this. > > > > > > > > > > > > > > > My setup: > > > > > > > > ceph --admin-daemon /var/run/ceph/ceph-osd.32.asok config show | > > > > grep cache > > > > > > > > > > > Nearly all of these are irrelevant, output of "ceph osd pool ls detail" > > > please, at least for the cache pool. > > > > > > ceph osd pool ls detail > > pool 2 'rbd' replicated size 3 min_size 2 crush_ruleset 0 > > object_hash > rjenkins pg_num 2048 pgp_num 2048 last_change 68565 flags hashpspool > min_read_recency_for_promote 1 min_write_recency_for_promote 1 > stripe_width 0 > > removed_snaps [1~2,4~12,17~2e,46~ad,f9~2,fd~2,101~2] > > pool 4 'ssd' replicated size 3 min_size 1 crush_ruleset 1 > > object_hash > rjenkins pg_num 128 pgp_num 128 last_change 68913 flags > hashpspool,incomplete_clones tier_of 5 cache_mode writeback > target_bytes 182536110080 hit_set bloom{false_positive_probability: > 0.05, > target_size: 0, seed: 0} 600s x6 stripe_width 0 > > removed_snaps > > [1~3,6~2,9~2,d~8,17~6,1f~10,33~8,3f~a,4d~2,55~22,79~2] > > pool 5 'sata' replicated size 3 min_size 1 crush_ruleset 2 > > object_hash > rjenkins pg_num 128 pgp_num 128 last_change 68910 lfor 66807 flags > hashpspool tiers 4 read_tier 4 write_tier 4 stripe_width 0 > > removed_snaps > > [1~3,6~2,9~2,d~8,17~6,1f~10,33~8,3f~a,4d~2,55~22,79~2] > > > I'd go for 256 PGs, how big (OSDs) is your "sata" pool? > "sata" pool has 16OSDs, 1024PGs > Christian > > > Cache tier on 'ssd' pool for 'sata' pool. > > > > > > > > Have you read the documentation and my thread in this ML labeled > > > "Cache tier operation clarifications"? > > > > I have read documentation and some Intel blog > (https://software.intel.com/en-us/blogs/2015/03/03/ceph-cache-tiering- > introduction), I will search now for Your post and read them. > > > > > > > > > > > > > Can someone help? Any ideas? It is normal that whole cluster > > > > stops at disk full error on cache tier, I was thinking that only > > > > one of pools can stops and other without cache tier should still work. > > > > > > > Once you activate a cache-tier it becomes for all intends and > > > purposes the the pool it's caching for. > > > So any problem with it will be fatal. > > > > OK. > > > > > > > > Christian > > > -- > > > Christian Balzer Network/Systems Engineer > > > chibi@xxxxxxx Global OnLine Japan/Rakuten Communications > > > http://www.gol.com/ > > > > Thank You for Your help. > > Mateusz > > > > > > > -- > Christian Balzer Network/Systems Engineer > chibi@xxxxxxx Global OnLine Japan/Rakuten Communications > http://www.gol.com/ Regards Mateusz _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com