On Mon, Dec 22, 2014 at 8:20 AM, Wido den Hollander <wido@xxxxxxxx> wrote: > Hi, > > While investigating slow requests on a Firefly (0.80.7) I looked at the > historic ops from the admin socket. > > On a OSD which just spitted out some slow requests I noticed: > > "received_at": "2014-12-22 17:08:41.496391", > "age": "9.948475", > "duration": "5.915489" > > { "time": "2014-12-22 17:08:41.496687", > "event": "waiting_for_osdmap"}, > { "time": "2014-12-22 17:08:46.216946", > "event": "reached_pg"}, > > It spend 5 seconds at "waitinf_for_osdmap" > > Another request: > > "received_at": "2014-12-22 17:08:41.499092", > "age": "9.945774", > "duration": "9.851261", > > { "time": "2014-12-22 17:08:41.499322", > "event": "waiting_for_osdmap"}, > { "time": "2014-12-22 17:08:51.349938", > "event": "reached_pg"} > > How should I see this? What is the OSD actually doing? > > In this case it is a RBD workload with all clients running with 0.80.5 > librados. > > The mons are in quorum and time is in sync and there are no osdmap > changes happing at this moment. > > A earlier thread [0] suggested that it might also be a PG issue where > requests are serialized. > > I do at some occasions see disks spiking to 100% busy for some time, but > I just want to understand the waiting_for_osdmap better to fully > understand what is happening there. What message types are these? The waiting_for_osdmap state is supposed to cover only that, but there might be some overlooked blocking points or something. -Greg _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com