Hi,
I thought I would share some results of tracing I've been conducting
recently. I instrumented Ceph (OSD only) with GCC's
-finstrument-functions option, and traced it and the kernel
simultaneously using LTTng (no instrumentation is actually necessary,
though rebuilding Ceph is required).
In [1]:
- the top view shows the state of the OSD process (22983). It is
switching between the running (green) and blocked (yellow) states
- the middle view shows the critical path of that process, in other
words it shows what the process 22983 was waiting for when it is in the
blocked (yellow) state
- the bottom view shows the call stack graph, which is all function
calls in OSD code, generated by -finstrument-functions
Image [2] shows a better image of the call stack view, and Image [3]
shows a better one of the critical path.
The code is in branch [4].
After looking extensively at the result, I created a diagram that
simplifies these interactions and shows the high level behavior of Ceph.
Here's the IO path of a write_full() - how it goes from the client to
all OSDs and back. The OSDs talk to each other through their respective
msg-worker thread using sockets, and the threads of the same OSD talk to
each other through semaphores (int Wait(Mutex &mutex) in Cond.h).
Colored arrows are messages across nodes (over the network). Thick
horizontal lines separate nodes, thin horizontal lines separate threads
on the same node.
https://docs.google.com/drawings/d/1fFb-aI8NQq7RESV2OKzrte2KvaBWAh6X9TL7woAGAmo/edit?usp=sharing
I did the same exercise for librbd (build with -finstrument-functions).
Here's the IO path in librbd for writes against images with RBD
journaling enabled and disabled. The red parts only appear when
journaling is enabled. There's still a bit more differences for the
final flush() part, but I've omitted them for simplicity. If there's
need/interest I can add those as well. The dotted vertical line separate
iterations of writes() in the client (rbd tool).
https://docs.google.com/drawings/d/1s9pRVygdPzBtQyYrvPueF8Cm6Xi1d4oy5gNMOjx2Wzo/edit?usp=sharing
I think these are the kind of diagrams we'll be able to get more easily
from Zipkin/Blkin.
Mohamad
[1] https://drive.google.com/open?id=0B0_2xRlvBgt9c3ltaTNmMEplTlE
[2] https://drive.google.com/open?id=0B0_2xRlvBgt9TjVNTUpzWkpITjQ
[3] https://drive.google.com/open?id=0B0_2xRlvBgt9dl9vS083X2sxWTQ
[4] https://github.com/mogeb/ceph/tree/wip-finstrument-functions
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html