Been a while, but... On Thu, Feb 25, 2016 at 9:50 AM, Nick Fisk <nick@xxxxxxxxxx> wrote: > I'm just trying to understand the steps each IO goes through and have been > looking at the output dump historic ops command from the admin socket. > There's a couple of steps I'm not quite sure what they mean and also > slightly puzzled by the delay and was wondering if anybody could share some > knowledge around this. > > Here is what I think I understand so far: > > Initiated = When the OSD received the OP Yep, this is when the first byte of the message incoming off the wire got noticed by the OSD. > > Queued for PG / Reached PG / Started = This seems to be how long the OSD has > to wait to get a lock on the PG before actually starting the write. Correct? Right, that's "waiting for PG lock", "got into PG", and "PG started it through the disk writing process" > Is there any perf stats to track this number? And why do I see a 150ms delay > before started. Am I possibly hitting some sort of queue on the PG? Is this > just a large queue of requests on the PG that are waiting to be written to > the journal? Any tips to reduce this? A delay before "started" is contention of some sort; which one depends on what state it's blocked in. If it was already in "Reached PG", that means the PG (or, possibly, some other PG within the same thread shard) was busy until that point. Earlier on, it might be network contention or one of the throttles that limits how many uncommitted ops the OSD will accept at once. > > Waiting for Sub Ops = Self-explanatory, its waiting for replica OSD's to > apply the op to journal > > commit_queued_for_journal_write/ write_thread_in_journal_buffer/ > journaled_completion_queued/ op_commit = How long it takes to queue and > write to the journal. In example case its 4ms....seems very high for s3700 > SSD? Maybe lots of ops are queued up? Most other ops show this <1ms. Yep to all that. If the journal is taking an unexpected amount of time it could also have hit a throttle (to keep it from going too far ahead of the backing store). > > sub_op_commit_rec = This is where we hear back from the replica OSD's Yep. > > op_applied/done = We have finished so send ACK back to client Applied means it's been given to the backing store's filesystem; done means we actually sent the client ack back into the TCP stack. -Greg _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com