RE: High CPU usage in Tiering Agent

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



All OSDs now run with the patch. Logfiles look good (agent flushes in low
mode with 2 ops and then waits), cpu / disk usage normal. Thanks for the
quick response!

Markus

-----Original Message-----
From: Sage Weil [mailto:sweil@xxxxxxxxxx] 
Sent: Freitag, 12. Februar 2016 21:24
To: Markus Blank-Burian <burian@xxxxxxxxxxx>
Cc: ceph-devel@xxxxxxxxxxxxxxx
Subject: RE: High CPU usage in Tiering Agent

On Fri, 12 Feb 2016, Markus Blank-Burian wrote:
> In my tests, as a workaround I have set osd_agent_max_low_ops == 
> osd_agent_max_ops. As expected, while in low flush mode the agent now 
> only iterates the objects until it finds something to flush, then 
> waits.. and does nothing when at idle. I can see this from the log 
> files. So if there is no side effect to this fix, I can test it later on
our production system.
> For any advanced solution further reducing the agents load, I have too 
> little knowledge about the internals. Just read through the code the 
> first time today to find out what's going on.

Okay, I looked at this some more and see the issue now.  It's a bit
confusing because the low/high thing is flush based but the agent does
evicts too.  This basically means we may fail to keep up to max evict ops in
flight when in low mode, but that seems just fine to me.

Can you take a look at this patch?

	https://github.com/ceph/ceph/pull/7631

Thanks!
sage

> 
> Markus
> 
> -----Original Message-----
> From: Sage Weil [mailto:sweil@xxxxxxxxxx]
> Sent: Freitag, 12. Februar 2016 19:13
> To: Markus Blank-Burian <burian@xxxxxxxxxxx>
> Subject: Re: High CPU usage in Tiering Agent
> 
> On Fri, 12 Feb 2016, Markus Blank-Burian wrote:
> > Hi,
> > 
> > I had an issue with major CPU usage on several OSDs. Looking further 
> > into this matter it turned out, that the tiering agent was iterating 
> > directories over and over. The metadata was all cached, so there was 
> > no
> disk activity.
> > 
> > If we have e.g. flush-mode low and evict-mode idle, then there are 
> > only osd_agent_max_low_ops concurrent requests allowed, but the 
> > check in OSDService::agent_entry only waits, if agent_ops >= 
> > osd_agent_max_ops. It would be straight forward to add an additional 
> > check for osd_agent_max_low_ops. Would there be any issues with the 
> > modifications in the attached patch?
> 
> That sounds reasonable to me.  Does it address the core problem though?  
> It sounds like what we really need is to limit the amount of non-op 
> work the agent does...
> 
> sage
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux