Re: Slow requests: waiting_for_osdmap

Gregory Farnum <greg@xxxxxxxxxxx> · Mon, 22 Dec 2014 10:04:48 -0800



On Mon, Dec 22, 2014 at 8:20 AM, Wido den Hollander <wido@xxxxxxxx> wrote:
> Hi,
>
> While investigating slow requests on a Firefly (0.80.7) I looked at the
> historic ops from the admin socket.
>
> On a OSD which just spitted out some slow requests I noticed:
>
>           "received_at": "2014-12-22 17:08:41.496391",
>           "age": "9.948475",
>           "duration": "5.915489"
>
>           { "time": "2014-12-22 17:08:41.496687",
>             "event": "waiting_for_osdmap"},
>           { "time": "2014-12-22 17:08:46.216946",
>             "event": "reached_pg"},
>
> It spend 5 seconds at "waitinf_for_osdmap"
>
> Another request:
>
>           "received_at": "2014-12-22 17:08:41.499092",
>           "age": "9.945774",
>           "duration": "9.851261",
>
>         { "time": "2014-12-22 17:08:41.499322",
>           "event": "waiting_for_osdmap"},
>         { "time": "2014-12-22 17:08:51.349938",
>           "event": "reached_pg"}
>
> How should I see this? What is the OSD actually doing?
>
> In this case it is a RBD workload with all clients running with 0.80.5
> librados.
>
> The mons are in quorum and time is in sync and there are no osdmap
> changes happing at this moment.
>
> A earlier thread [0] suggested that it might also be a PG issue where
> requests are serialized.
>
> I do at some occasions see disks spiking to 100% busy for some time, but
> I just want to understand the waiting_for_osdmap better to fully
> understand what is happening there.

What message types are these? The waiting_for_osdmap state is supposed
to cover only that, but there might be some overlooked blocking points
or something.
-Greg
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com