RE: performance latency analysis with LTTng

Sage Weil <sage@xxxxxxxxxxxx> · Thu, 11 Dec 2014 11:43:51 -0800 (PST)

On Thu, 11 Dec 2014, Blinick, Stephen L wrote:
> Hi Andreas, Thanks for providing this info last week.  I've reviewed the 
> data, the annotated spreadsheet is very helpful to interpret the 
> results.  I do think we're seeing similar measurements even though I'm 
> characterizing a more simplified environment.  The key differences are 
> that I'm running with no replication (pool has size=1) and memstore 
> backend.  The client (smallio_bench) in this environment, with a single 
> OSD, qd of 1 (t=1), is seeing a latency on average of about 1200us.  My 
> goal was to account for 100% of this latency via traces in order to 
> determine where to focus (with other tools, profilers, etc).  So far I'm 
> covering about 750us of the 1200us, missing the time the commit message 
> takes to be dispatched (received) by the client.
> 
> In order to do this, I had to run the client on the same node as the 
> OSD, and Vishal Verma on my team added some additional traces to the 
> librados code.  In queue_aio_write, I don't think there's a transaction 
> ID yet so we traced the OID in several places, and then used a script 
> written in perl to attempt to correlate things and process the results.
> 
> Beyond that, I think the analysis data is very similar to yours, I need 
> to annotate it. I opted to trace out the time as "delta since last 
> trace" instead of time since start of transaction, and so the entries 
> with higher elapsed times indicate whatever was between those traces 
> took a long time, not necessarily the code at the tracepoint.
> 
> I haven't yet accounted for the entire time seen by the client.  From 
> the time the aio_write is queued, until ms_fast_dispatch, I have about 
> 300us of time.. quite large.  I didn't trace the pipe:reader/writer, and 
> I see for you that's ~90us of time.  So that would cover about a third 
> of it.  Additionally, I would like to be able to trace the dispatch of 
> the commit ack (complete_aio_write) but at that point I don't have an 
> OID or tid to correlate to.
> 
> One question about the tracing code:  I am a bit concerned about using 
> static sized arrays on the stack to populate with timing..  The reason 
> is, any change by adding a tracepoint to a function would require 
> updating a #define for the index, the array size, and the tracepoint (3 
> spots).  If one of those is missed it could result in a stack overwrite.  
> Is the reason for this to reduce the # of traces, or is it to make 
> correlating times easier?
> 
> Anyone have any ideas about the relatively high time spent between the 
> client & pipe reader in the OSD?

I think without more detailed tracepoints (or maybe a tcpdump?) it's hard 
to nail it down.. it's not even clear whether it's on the client side or 
osd side.  :/

Is there a document somewhere that shows your current breakdown of time 
spent?

Thanks!
sage

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html