Re: Dump Historic Ops Breakdown

Nick Fisk <nick@xxxxxxxxxx> · Tue, 29 Mar 2016 19:46:19 +0100

> Been a while, but...

Brilliant, just what I needed to know. Thanks for the confirmation/answers.

> 
> On Thu, Feb 25, 2016 at 9:50 AM, Nick Fisk <nick@xxxxxxxxxx> wrote:
> > I'm just trying to understand the steps each IO goes through and have
> > been looking at the output dump historic ops command from the admin
> socket.
> > There's a couple of steps I'm not quite sure what they mean and also
> > slightly puzzled by the delay and was wondering if anybody could share
> > some knowledge around this.
> >
> > Here is what I think I understand so far:
> >
> > Initiated = When the OSD received the OP
> 
> Yep, this is when the first byte of the message incoming off the wire got
> noticed by the OSD.
> 
> >
> > Queued for PG / Reached PG / Started = This seems to be how long the
> > OSD has to wait to get a lock on the PG before actually starting the
write.
> Correct?
> 
> Right, that's "waiting for PG lock", "got into PG", and "PG started it
through
> the disk writing process"
> 
> > Is there any perf stats to track this number? And why do I see a 150ms
> > delay before started. Am I possibly hitting some sort of queue on the
> > PG? Is this just a large queue of requests on the PG that are waiting
> > to be written to the journal? Any tips to reduce this?
> 
> A delay before "started" is contention of some sort; which one depends on
> what state it's blocked in. If it was already in "Reached PG", that means
the
> PG (or, possibly, some other PG within the same thread
> shard) was busy until that point. Earlier on, it might be network
contention or
> one of the throttles that limits how many uncommitted ops the OSD will
> accept at once.
> 
> >
> > Waiting for Sub Ops = Self-explanatory, its waiting for replica OSD's
> > to apply the op to journal
> >
> > commit_queued_for_journal_write/ write_thread_in_journal_buffer/
> > journaled_completion_queued/ op_commit = How long it takes to queue
> > and write to the journal. In example case its 4ms....seems very high
> > for s3700 SSD? Maybe lots of ops are queued up? Most other ops show this
> <1ms.
> 
> Yep to all that. If the journal is taking an unexpected amount of time it
could
> also have hit a throttle (to keep it from going too far ahead of the
backing
> store).
> 
> >
> > sub_op_commit_rec = This is where we hear back from the replica OSD's
> 
> Yep.
> 
> >
> > op_applied/done = We have finished so send ACK back to client
> 
> Applied means it's been given to the backing store's filesystem; done
means
> we actually sent the client ack back into the TCP stack.
> -Greg
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com