RE: High CPU usage in Tiering Agent

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 12 Feb 2016, Markus Blank-Burian wrote:
> In my tests, as a workaround I have set osd_agent_max_low_ops ==
> osd_agent_max_ops. As expected, while in low flush mode the agent now only
> iterates the objects until it finds something to flush, then waits.. and
> does nothing when at idle. I can see this from the log files. So if there is
> no side effect to this fix, I can test it later on our production system.
> For any advanced solution further reducing the agents load, I have too
> little knowledge about the internals. Just read through the code the first
> time today to find out what's going on.

Okay, I looked at this some more and see the issue now.  It's a bit 
confusing because the low/high thing is flush based but the agent does 
evicts too.  This basically means we may fail to keep up to max evict ops 
in flight when in low mode, but that seems just fine to me.

Can you take a look at this patch?

	https://github.com/ceph/ceph/pull/7631

Thanks!
sage

> 
> Markus
> 
> -----Original Message-----
> From: Sage Weil [mailto:sweil@xxxxxxxxxx] 
> Sent: Freitag, 12. Februar 2016 19:13
> To: Markus Blank-Burian <burian@xxxxxxxxxxx>
> Subject: Re: High CPU usage in Tiering Agent
> 
> On Fri, 12 Feb 2016, Markus Blank-Burian wrote:
> > Hi,
> > 
> > I had an issue with major CPU usage on several OSDs. Looking further 
> > into this matter it turned out, that the tiering agent was iterating 
> > directories over and over. The metadata was all cached, so there was no
> disk activity.
> > 
> > If we have e.g. flush-mode low and evict-mode idle, then there are 
> > only osd_agent_max_low_ops concurrent requests allowed, but the check 
> > in OSDService::agent_entry only waits, if agent_ops >= 
> > osd_agent_max_ops. It would be straight forward to add an additional 
> > check for osd_agent_max_low_ops. Would there be any issues with the 
> > modifications in the attached patch?
> 
> That sounds reasonable to me.  Does it address the core problem though?  
> It sounds like what we really need is to limit the amount of non-op work the
> agent does...
> 
> sage
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux