Hi Christian, > Hello, > > Ceph 0.94.5 for the record. > > As some may remember, I phased in a 2TB cache tier 5 weeks ago. > > About now it has reached about 60% usage, which is what I have the > cache_target_dirty_ratio set to. > > And for the last 3 days I could see some writes (op_in_bytes) to the backing > storage (aka HDD pool), which hadn't seen any write action for the > aforementioned 5 weeks. > Alas my graphite dashboard showed no flushes (tier_flush), whereas > tier_promote on the cache pool could always be matched more or less to > op_out_bytes on the HDD pool. >From what I understand a read that will lead to a promotion will be proxied 1st and then promoted. I don't know if what you are seeing is actually the proxy read matching up with the promotion count, rather than the promotion reads??? > > The documentation (RH site) just parrots the names of the various perf > counters, so no help there. OK, lets look a what we got: > --- > "tier_promote": 49776, > "tier_flush": 0, > "tier_flush_fail": 0, > "tier_try_flush": 558, > "tier_try_flush_fail": 0, > "agent_flush": 558, > "tier_evict": 0, > "agent_evict": 0, > --- > Lots of promotions, that's fine. > Not a single tier_flush, er. wot? So what does this denote then? > OK, clearly tier_try_flush and agent_flush are where the flushing is actually > recorded (in my test cluster they differ, as I have run that against the wall > several times). > No evictions yet, that will happen at 90% usage. > > So now I changed the graph data source for flushes to tier_try_flush, > however that does not match most of the op_in_bytes (or any other counter > I > tried!) on the HDDs. Hmmm...like above maybe promotion/flush IO is not included in the counters. I know from looking at the counters that there are op_bytes and subop_bytes, so it appears the OSD's do differentiate between different types of operations (client/replication). Maybe there needs to be cacheop_bytes counters? > As, in there are flushes but no activity on the HDD OSDs as far as Ceph seems > to be concerned. > I can however match the flushes to actual disk activity on the HDDs (gathered > by collectd), which are otherwise totally dormant. > > Can somebody shed some light on this, is it a known problem, in need of a > bug report? > > Christian > -- > Christian Balzer Network/Systems Engineer > chibi@xxxxxxx Global OnLine Japan/Rakuten Communications > http://www.gol.com/ > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com