Re: Performance counters oddities, cache tier and otherwise

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello Nick,

On Thu, 7 Apr 2016 10:03:27 +0100 Nick Fisk wrote:

> Hi Christian,
> 
>  
> > Hello,
> > 
> > Ceph 0.94.5 for the record.
> > 
> > As some may remember, I phased in a 2TB cache tier 5 weeks ago.
> > 
> > About now it has reached about 60% usage, which is what I have the
> > cache_target_dirty_ratio set to.
> > 
> > And for the last 3 days I could see some writes (op_in_bytes) to the
> backing
> > storage (aka HDD pool), which hadn't seen any write action for the
> > aforementioned 5 weeks.
> > Alas my graphite dashboard showed no flushes (tier_flush), whereas
> > tier_promote on the cache pool could always be matched more or less to
> > op_out_bytes on the HDD pool.
> 
> From what I understand a read that will lead to a promotion will be
> proxied 1st and then promoted. I don't know if what you are seeing is
> actually the proxy read matching up with the promotion count, rather
> than the promotion reads???
> 
I wasn't actually trying to match them up exactly, but seeing that we have
"tier_proxy_read": 12279
for the same OSD as below what you're saying definitely seems to be
correct.
It explains the higher than expected activity I'm seeing in the graphs, but
since they match up timing wise that's fine.

> > 
> > The documentation (RH site) just parrots the names of the various perf
> > counters, so no help there. OK, lets look a what we got:
> > ---
> >         "tier_promote": 49776,
> >         "tier_flush": 0,
> >         "tier_flush_fail": 0,
> >         "tier_try_flush": 558,
> >         "tier_try_flush_fail": 0,
> >         "agent_flush": 558,
> >         "tier_evict": 0,
> >         "agent_evict": 0,
> > ---
> > Lots of promotions, that's fine.
> > Not a single tier_flush, er. wot? So what does this denote then?
> > OK, clearly tier_try_flush and agent_flush are where the flushing is
> actually
> > recorded (in my test cluster they differ, as I have run that against
> > the
> wall
> > several times).
> > No evictions yet, that will happen at 90% usage.
> > 
> > So now I changed the graph data source for flushes to tier_try_flush,
> > however that does not match most of the op_in_bytes (or any other
> > counter I
> > tried!) on the HDDs.
> 
> Hmmm...like above maybe promotion/flush IO is not included in the
> counters.
> 
> I know from looking at the counters that there are op_bytes and
> subop_bytes, so it appears the OSD's do differentiate between different
> types of operations (client/replication). Maybe there needs to be
> cacheop_bytes counters?
> 
That was the missing link. It needs to be both, as it turns out.
If I display both op and subop I get a match for all the flushes.
Not sure why it seemingly randomly distributes between those two types.

Thanks, that leaves the mystery of tier_flush.
My guess would be that this may be the "flush or die because we need to
evict stuff and need clean objects as things are full" type of flushes, as
this counter is non-zero on my test cluster where I indeed created those
situations.

Christian
> > As, in there are flushes but no activity on the HDD OSDs as far as Ceph
> seems
> > to be concerned.
> > I can however match the flushes to actual disk activity on the HDDs
> (gathered
> > by collectd), which are otherwise totally dormant.
> > 
> > Can somebody shed some light on this, is it a known problem, in need
> > of a bug report?
> > 
> > Christian
> > --
> > Christian Balzer        Network/Systems Engineer
> > chibi@xxxxxxx   	Global OnLine Japan/Rakuten Communications
> > http://www.gol.com/
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@xxxxxxxxxxxxxx
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 


-- 
Christian Balzer        Network/Systems Engineer                
chibi@xxxxxxx   	Global OnLine Japan/Rakuten Communications
http://www.gol.com/
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux