RE: Cache pool latency impact

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 14 Jan 2015, Pavan Rallabhandi wrote:
> Thanks for the reply Sage; please ignore the same subject mails on 
> ceph-users, they seem to have got delivered today.
> 
> > Hmm, we could have a 'noagent' option (similar to noout, nobackfill, 
> > noscrub, etc.) that lets the admin tell the system to stop tiering 
> > movements, but I'm not sure that's wht you're asking for...
> 
> Was not aware of 'notieragent' flag but I was hinting at a flow control 
> type of mechanism that would help throttling the client IOs versus the 
> service time of the tiering agent to flush/evict.

There is also

	osd_agent_max_ops = 4

which is a coarse control but may be sufficient for you?

sage



> 

> Thanks,
> -Pavan.
> 
> -----Original Message-----
> From: Sage Weil [mailto:sage@xxxxxxxxxxxx]
> Sent: Tuesday, January 13, 2015 7:31 PM
> To: Pavan Rallabhandi
> Cc: Ceph Development
> Subject: Re: Cache pool latency impact
> 
> On Tue, 13 Jan 2015, Pavan Rallabhandi wrote:
> > Hi,
> >
> > This is regarding cache pools and the impact of the flush/evict on the
> > client IO latencies.
> >
> > Am seeing a direct impact on the client IO latencies (making them
> > worse) when flush/evict is triggered on the cache pool. In a constant
> > ingress of IOs on the cache pool, the write performance is no better
> > than without cache pool, because it is limited to the speed at which
> > objects can be flushed/evicted to the backend pool.
> 
> Yeah, this is always going to be true in general.  It is a lot for work to write into the cache, read it back, write it again into the base pool, and then delete it from the cache than it is to write directly to the base pool.
> 
> > > The questions I have are:
> >
> > 1) When the flush/evict is in progress, are the writes on the cache
> > pool blocked, either at the PG or at object granularity? Though I see
> > a blocking flag honored per object context in
> > ReplicatedPG::start_flush() and most of the callers seem to set the flag to be false.
> 
> Normally they are not blocked.  The agent starts working (finding objects to flush or evict) long before we hit the cut cutoff where it starts blocking.  Once it does hit that threshold, though, things can get slow, because new cache creates aren't allowed until some eviction completes.  You don't want to be in this situation.  :)
> 
> In general, if you have a lot of data inject, caching (at least in
> firefly) isn't a terribly good idea.  The exception would probably be when you have a high skew toward recent data (say you are injecting market data, and do tons of analytics on the last 24 hours, but then the data gets colder).
> 
> I can't tell if you're in the situation where the cache pool is full and the agent is flushing/evicing anything and everything and writes are crawling (you should see a message in 'ceph health' when this happens) or that the agent is alive but working with low effort and the impact is still high.  If it's the latter I'm not sure yet what is going wrong..
> perhaps you can capture a few minutes of log from one of your OSDs?
> (debug ms = 1, debug osd = 20).
> 
> > 2) Is there any mechanism (that I might have overlooked) to avoid this
> > situation, by throttling the flush/evict operations on the fly? If
> > not, shouldn't there be one?
> 
> Hmm, we could have a 'noagent' option (similar to noout, nobackfill, noscrub, etc.) that lets the admin tell the system to stop tiering movements, but I'm not sure that's wht you're asking for...
> 
> sage
> 
> ________________________________
> 
> PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux