Couple options: 1) you can enable LTTng-UST tracing [1][2] against your VM for an extremely light-weight way to track IO latencies. 2) you can enable "debug rbd = 20" and grep through the logs for matching "AioCompletion.*(set_request_count|finalize)" log entries 3) use the asok file during one of these events to dump the objecter requests [1] http://docs.ceph.com/docs/jewel/rbd/rbd-replay/ [2] http://tracker.ceph.com/issues/14629 On Tue, Apr 4, 2017 at 7:36 AM, Laszlo Budai <laszlo@xxxxxxxxxxxxxxxx> wrote: > Hello cephers, > > I have a situation where from time to time the write operation to the seph > storage hangs for 3-5 seconds. For testing we have a simple line like: > while sleep 1; date >> logfile; done & > > with this we can see that rarely there are 3 seconds or more differences > between the consecutive outputs of date. > Initially we have suspected the deep scrub and we have tuned its parameters, > so right now I'm confident that the reason is something different than the > deep scrubbing. > > I would like to know if any of you has encountered a similar situation, and > what was the solution for it. > I am suspecting the network between the compute nodes and the storage, but I > need to prove this. I am thinking on enabling client side logging for > librbd, but I see there are many subsystems where the logging can be > enabled. Can anyone tell me which subsystem should I log, and at which level > to be able to see whether the network is causing write issues? > We're using ceph 0.94.10. > > Thank you, > Laszlo > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Jason _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com