Re: Slow requests: waiting_for_osdmap

Gregory Farnum <greg@xxxxxxxxxxx> · Mon, 22 Dec 2014 10:42:32 -0800

On Mon, Dec 22, 2014 at 10:30 AM, Wido den Hollander <wido@xxxxxxxx> wrote:

> For example, two ops:
>
> #1:
>
> { "description": "osd_sub_op(client.2433432.0:61603164 20.424
> 19038c24\/rbd_data.d7c912ae8944a.00000000000008b6\/head\/\/20 [] v
> 63283'8301089 snapset=0=[]:[] snapc=0=[])",
>           "received_at": "2014-12-22 19:26:37.458680",
>           "age": "2.719850",
>           "duration": "2.520937",
>           "type_data": [
>                 "commit sent; apply or cleanup",
>                 [
>                     { "time": "2014-12-22 19:26:37.458914",
>                       "event": "waiting_for_osdmap"},
>                     { "time": "2014-12-22 19:26:39.310569",
>                       "event": "reached_pg"},
>                     { "time": "2014-12-22 19:26:39.310728",
>                       "event": "started"},
>                     { "time": "2014-12-22 19:26:39.310951",
>                       "event": "started"},
>                     { "time": "2014-12-22 19:26:39.979292",
>                       "event": "commit_queued_for_journal_write"},
>                     { "time": "2014-12-22 19:26:39.979348",
>                       "event": "write_thread_in_journal_buffer"},
>                     { "time": "2014-12-22 19:26:39.979594",
>                       "event": "journaled_completion_queued"},
>                     { "time": "2014-12-22 19:26:39.979617",
>                       "event": "commit_sent"}]]},
>
> #2:
>
> { "description": "osd_sub_op(client.2188703.0:10420738 20.641
> 6673ee41\/rbd_data.9497e32794ff7.0000000000000454\/head\/\/20 [] v
> 63283'5215076 snapset=0=[]:[] snapc=0=[])",
>           "received_at": "2014-12-22 19:26:38.040551",
>           "age": "2.137979",
>           "duration": "1.537128",
>           "type_data": [
>                 "started",
>                 [
>                     { "time": "2014-12-22 19:26:38.040717",
>                       "event": "waiting_for_osdmap"},
>                     { "time": "2014-12-22 19:26:39.577609",
>                       "event": "reached_pg"},
>                     { "time": "2014-12-22 19:26:39.577624",
>                       "event": "started"},
>                     { "time": "2014-12-22 19:26:39.577679",
>                       "event": "started"}]]},

Oh, yep, in Firefly it's stuck in the waiting_for_osdmap state while
it's in the PG work queue as well. Whoops...
So this is probably just general slowness filling up the work queue.

> Can this be something which has to do with the amount of RBD snapshots?
> Since I see snapc involved in both ops?

It could conceivably have something to do with snapshots, but if it
does the presence of "snapc" isn't an indicator; that's always present
and is outputting the default. :)

If you're seeing disks at 100% I think stuff's just getting a little
backed up. You could also check the distribution of incoming
operations across PGs; if e.g. a flood of ops are going to one object
that could also cause issues.
-Greg
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com