Re: Performance counters oddities, cache tier and otherwise

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Christian,

 
> Hello,
> 
> Ceph 0.94.5 for the record.
> 
> As some may remember, I phased in a 2TB cache tier 5 weeks ago.
> 
> About now it has reached about 60% usage, which is what I have the
> cache_target_dirty_ratio set to.
> 
> And for the last 3 days I could see some writes (op_in_bytes) to the
backing
> storage (aka HDD pool), which hadn't seen any write action for the
> aforementioned 5 weeks.
> Alas my graphite dashboard showed no flushes (tier_flush), whereas
> tier_promote on the cache pool could always be matched more or less to
> op_out_bytes on the HDD pool.

>From what I understand a read that will lead to a promotion will be proxied
1st and then promoted. I don't know if what you are seeing is actually the
proxy read matching up with the promotion count, rather than the promotion
reads???

> 
> The documentation (RH site) just parrots the names of the various perf
> counters, so no help there. OK, lets look a what we got:
> ---
>         "tier_promote": 49776,
>         "tier_flush": 0,
>         "tier_flush_fail": 0,
>         "tier_try_flush": 558,
>         "tier_try_flush_fail": 0,
>         "agent_flush": 558,
>         "tier_evict": 0,
>         "agent_evict": 0,
> ---
> Lots of promotions, that's fine.
> Not a single tier_flush, er. wot? So what does this denote then?
> OK, clearly tier_try_flush and agent_flush are where the flushing is
actually
> recorded (in my test cluster they differ, as I have run that against the
wall
> several times).
> No evictions yet, that will happen at 90% usage.
> 
> So now I changed the graph data source for flushes to tier_try_flush,
> however that does not match most of the op_in_bytes (or any other counter
> I
> tried!) on the HDDs.

Hmmm...like above maybe promotion/flush IO is not included in the counters.

I know from looking at the counters that there are op_bytes and subop_bytes,
so it appears the OSD's do differentiate between different types of
operations (client/replication). Maybe there needs to be cacheop_bytes
counters?

> As, in there are flushes but no activity on the HDD OSDs as far as Ceph
seems
> to be concerned.
> I can however match the flushes to actual disk activity on the HDDs
(gathered
> by collectd), which are otherwise totally dormant.
> 
> Can somebody shed some light on this, is it a known problem, in need of a
> bug report?
> 
> Christian
> --
> Christian Balzer        Network/Systems Engineer
> chibi@xxxxxxx   	Global OnLine Japan/Rakuten Communications
> http://www.gol.com/
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux