opentracing

Sage Weil <sweil@xxxxxxxxxx> · Fri, 19 Jan 2018 19:52:13 +0000 (UTC)

I think we need to take a step back and reconsider our approach to 
tracing.  Thus far, it has been an ad hoc combination of our 
home-brew debug logs and a few lttng tracepoints.  We have some initial 
integration of blkin tracepoints as well, but I'm not sure if anyone has 
actually used them.

I'm looking at opentracing.io (see e.g. 
http://opentracing.io/documentation/) and this looks like a more viable 
path forward since it is not tied to specific tracing tools and is being 
adopted by CNCF projects.  There's also the new Jaeger tool that is (from 
what I gather) a newer dapper/zipkin type tool that will presumably be 
usable if we go this path.

I was on a call recently with a partner and they mentioned that their tool 
would consume tracepoints via opentracing tracepoints and that one of the 
key features was that it would sample instead of pulling an exhaustive 
trace.  (From what I gather this is outside of the opentracing library in 
the application, though--it's up to the tracer to sample or not to 
sample.)

One of the looming features that we'd like to work on now that the mgr is 
in place is a 'rados top' or 'rbd top' like function that samples requests 
at the OSDs (or clients?) to build an aggregate view of the top consumers 
of ops in the cluster.  I'm wondering whether it makes sense to build this 
sort of functionality on top of generic tracepoints instead of our own 
purpose-built instrumentation.

Is there anyone who is interested in heading this effort/investigation up?

sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html