On 12/22/2014 07:04 PM, Gregory Farnum wrote: > On Mon, Dec 22, 2014 at 8:20 AM, Wido den Hollander <wido@xxxxxxxx> wrote: >> Hi, >> >> While investigating slow requests on a Firefly (0.80.7) I looked at the >> historic ops from the admin socket. >> >> On a OSD which just spitted out some slow requests I noticed: >> >> "received_at": "2014-12-22 17:08:41.496391", >> "age": "9.948475", >> "duration": "5.915489" >> >> { "time": "2014-12-22 17:08:41.496687", >> "event": "waiting_for_osdmap"}, >> { "time": "2014-12-22 17:08:46.216946", >> "event": "reached_pg"}, >> >> It spend 5 seconds at "waitinf_for_osdmap" >> >> Another request: >> >> "received_at": "2014-12-22 17:08:41.499092", >> "age": "9.945774", >> "duration": "9.851261", >> >> { "time": "2014-12-22 17:08:41.499322", >> "event": "waiting_for_osdmap"}, >> { "time": "2014-12-22 17:08:51.349938", >> "event": "reached_pg"} >> >> How should I see this? What is the OSD actually doing? >> >> In this case it is a RBD workload with all clients running with 0.80.5 >> librados. >> >> The mons are in quorum and time is in sync and there are no osdmap >> changes happing at this moment. >> >> A earlier thread [0] suggested that it might also be a PG issue where >> requests are serialized. >> >> I do at some occasions see disks spiking to 100% busy for some time, but >> I just want to understand the waiting_for_osdmap better to fully >> understand what is happening there. > > What message types are these? The waiting_for_osdmap state is supposed > to cover only that, but there might be some overlooked blocking points > or something. For example, two ops: #1: { "description": "osd_sub_op(client.2433432.0:61603164 20.424 19038c24\/rbd_data.d7c912ae8944a.00000000000008b6\/head\/\/20 [] v 63283'8301089 snapset=0=[]:[] snapc=0=[])", "received_at": "2014-12-22 19:26:37.458680", "age": "2.719850", "duration": "2.520937", "type_data": [ "commit sent; apply or cleanup", [ { "time": "2014-12-22 19:26:37.458914", "event": "waiting_for_osdmap"}, { "time": "2014-12-22 19:26:39.310569", "event": "reached_pg"}, { "time": "2014-12-22 19:26:39.310728", "event": "started"}, { "time": "2014-12-22 19:26:39.310951", "event": "started"}, { "time": "2014-12-22 19:26:39.979292", "event": "commit_queued_for_journal_write"}, { "time": "2014-12-22 19:26:39.979348", "event": "write_thread_in_journal_buffer"}, { "time": "2014-12-22 19:26:39.979594", "event": "journaled_completion_queued"}, { "time": "2014-12-22 19:26:39.979617", "event": "commit_sent"}]]}, #2: { "description": "osd_sub_op(client.2188703.0:10420738 20.641 6673ee41\/rbd_data.9497e32794ff7.0000000000000454\/head\/\/20 [] v 63283'5215076 snapset=0=[]:[] snapc=0=[])", "received_at": "2014-12-22 19:26:38.040551", "age": "2.137979", "duration": "1.537128", "type_data": [ "started", [ { "time": "2014-12-22 19:26:38.040717", "event": "waiting_for_osdmap"}, { "time": "2014-12-22 19:26:39.577609", "event": "reached_pg"}, { "time": "2014-12-22 19:26:39.577624", "event": "started"}, { "time": "2014-12-22 19:26:39.577679", "event": "started"}]]}, Can this be something which has to do with the amount of RBD snapshots? Since I see snapc involved in both ops? This cluster has about 4.000 RBD images and about 1.000 of them have a snapshot. Total number of OSDs in this case is 174. > -Greg > -- Wido den Hollander 42on B.V. Ceph trainer and consultant Phone: +31 (0)20 700 9902 Skype: contact42on _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com