> Been a while, but... Brilliant, just what I needed to know. Thanks for the confirmation/answers. > > On Thu, Feb 25, 2016 at 9:50 AM, Nick Fisk <nick@xxxxxxxxxx> wrote: > > I'm just trying to understand the steps each IO goes through and have > > been looking at the output dump historic ops command from the admin > socket. > > There's a couple of steps I'm not quite sure what they mean and also > > slightly puzzled by the delay and was wondering if anybody could share > > some knowledge around this. > > > > Here is what I think I understand so far: > > > > Initiated = When the OSD received the OP > > Yep, this is when the first byte of the message incoming off the wire got > noticed by the OSD. > > > > > Queued for PG / Reached PG / Started = This seems to be how long the > > OSD has to wait to get a lock on the PG before actually starting the write. > Correct? > > Right, that's "waiting for PG lock", "got into PG", and "PG started it through > the disk writing process" > > > Is there any perf stats to track this number? And why do I see a 150ms > > delay before started. Am I possibly hitting some sort of queue on the > > PG? Is this just a large queue of requests on the PG that are waiting > > to be written to the journal? Any tips to reduce this? > > A delay before "started" is contention of some sort; which one depends on > what state it's blocked in. If it was already in "Reached PG", that means the > PG (or, possibly, some other PG within the same thread > shard) was busy until that point. Earlier on, it might be network contention or > one of the throttles that limits how many uncommitted ops the OSD will > accept at once. > > > > > Waiting for Sub Ops = Self-explanatory, its waiting for replica OSD's > > to apply the op to journal > > > > commit_queued_for_journal_write/ write_thread_in_journal_buffer/ > > journaled_completion_queued/ op_commit = How long it takes to queue > > and write to the journal. In example case its 4ms....seems very high > > for s3700 SSD? Maybe lots of ops are queued up? Most other ops show this > <1ms. > > Yep to all that. If the journal is taking an unexpected amount of time it could > also have hit a throttle (to keep it from going too far ahead of the backing > store). > > > > > sub_op_commit_rec = This is where we hear back from the replica OSD's > > Yep. > > > > > op_applied/done = We have finished so send ACK back to client > > Applied means it's been given to the backing store's filesystem; done means > we actually sent the client ack back into the TCP stack. > -Greg > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com