@Christian ,thanks for quick answer, please look bellow. > -----Original Message----- > From: Christian Balzer [mailto:chibi@xxxxxxx] > Sent: Monday, July 3, 2017 1:39 PM > To: ceph-users@xxxxxxxxxxxxxx > Cc: Mateusz Skała <mateusz.skala@xxxxxxxxxxx> > Subject: Re: Cache Tier or any other possibility to accelerate > RBD with SSD? > > > Hello, > > On Mon, 3 Jul 2017 13:01:06 +0200 Mateusz Skała wrote: > > > Hello, > > > > We are using cache-tier in Read-forward mode (replica 3) for > > accelerate reads and journals on SSD to accelerate writes. > > OK, lots of things wrong with this statement, but firstly, Ceph version (it is > relevant) and more details about your setup and SSDs used would be > interesting and helpful. > Sorry about this. Ceph version 0.92.1 and we plan to upgrade to 10.2.0 in short time. About the configuration: 4 nodes, each node with: - 4x HDD WD Re 2TB WD2004FBYZ, - 2x SSD Intel S3610 200GB (one for journal and system with mon, second for cache-tier). It gives 32TB RAW HDD space and only 600GB RAW SSD space, and I think it is problem with small size of cache. > If you had searched the ML archives for readforward you'd come across a > very recent thread by me, in which the powers that be state that this mode is > dangerous and not recommended. > During quite some testing with this mode I never encountered any problems, > but consider yourself warned. > > Now readforward will FORWARD reads to the backing storage, so it will > NEVER accelerate reads (promote them to the cache-tier). > The only speedup you will see is for objects that have been previously > written and are still in the cache-tier. > Ceph osd pool ls detail pool 4 'ssd' replicated size 3 min_size 1 crush_ruleset 1 object_hash rjenkins pg_num 128 pgp_num 128 last_change 88643 flags hashpspool,incomplete_clones tier_of 5 cache_mode readforward target_bytes 176093659136 hit_set bloom{false_positive_probability: 0.05, target_size: 0, seed: 0} 120s x6 min_read_recency_for_promote 1 min_write_recency_for_promote 1 stripe_width 0 removed_snaps [1~14d,150~27,178~8,183~8,18c~12,1a0~22,1c4~4,1c9~1b] pool 5 'sata' replicated size 3 min_size 1 crush_ruleset 2 object_hash rjenkins pg_num 512 pgp_num 512 last_change 88643 lfor 66807 flags hashpspool tiers 4 read_tier 4 write_tier 4 stripe_width 0 removed_snaps [1~14d,150~27,178~8,183~8,18c~12,1a0~22,1c4~4,1c9~1b] The setup has over 1 year. On ceph status I see flushing, promote and evicting operations. Maybe it depends on my old version? > Using cache-tiers can work beautifully if you understand the I/O patterns > involved (tricky on a cloud storage with very mixed clients), can make your > cache-tier large enough to cover the hot objects (working set) or at least (as > you are attempting) to segregate the read and write paths as much as > possible. > Have you got any good method to analyze workload? I found this script https://github.com/cernceph/ceph-scripts and try to see reads and writes per length, but how to know it is random or sequential io? > > We are using only RBD. Based > > on the ceph-docs, RBD have bad I/O pattern for cache tier. I'm > > looking for information about other possibility to accelerate reads on > > RBD with SSD drives. > > > The documentation rightly warns about things, so people don't have > unrealistic expectations. However YOU need to look at YOUR loads, patterns > and usage and then decide if it is beneficial or not. > > As I hinted above, analyze your systems, are the reads actually slow or are > they slowed down by competing writes to the same storage? > > Cold reads (OSD server just rebooted, no cache has that object in it) will > obviously not benefit from any scheme. > > Reads from the HDD OSDs can very much benefit by having enough RAM to > hold all the SLAB objects (direntry etc) in memory, so you can avoid disk > access to actually find the object. > > Speeding up the actual data read you have the option of the cache-tier (in > writeback mode, with proper promotion and retention configuration). > > Or something like bcache on the OSD servers, discussed here several times > as well. > > > The second question, is it any cache tier mode, that replica can be > > set on 1, for best use of SSD space? > > > A cache-tier (the same true for any other real cache methods) will at any > given time have objects in it that are NOT on the actual backing storage when > it is used to cache writes. > So it needs to be just as redundant as the rest of the system, at least a replica > of 2 with sufficiently small/fast SSDs. > OK, I understand. > With bcache etc just caching reads, you can get away with a single replication > of course, however failing SSDs may then cause your cluster to melt down. > I will search ML for this. > Christian > -- > Christian Balzer Network/Systems Engineer > chibi@xxxxxxx Rakuten Communications _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com