Re: Dump Historic Ops Breakdown

Gregory Farnum <gfarnum@xxxxxxxxxx> · Tue, 29 Mar 2016 11:35:23 -0700

Been a while, but...

On Thu, Feb 25, 2016 at 9:50 AM, Nick Fisk <nick@xxxxxxxxxx> wrote:
> I'm just trying to understand the steps each IO goes through and have been
> looking at the output dump historic ops command from the admin socket.
> There's a couple of steps I'm not quite sure what they mean and also
> slightly puzzled by the delay and was wondering if anybody could share some
> knowledge around this.
>
> Here is what I think I understand so far:
>
> Initiated = When the OSD received the OP

Yep, this is when the first byte of the message incoming off the wire
got noticed by the OSD.

>
> Queued for PG / Reached PG / Started = This seems to be how long the OSD has
> to wait to get a lock on the PG before actually starting the write. Correct?

Right, that's "waiting for PG lock", "got into PG", and "PG started it
through the disk writing process"

> Is there any perf stats to track this number? And why do I see a 150ms delay
> before started. Am I possibly hitting some sort of queue on the PG? Is this
> just a large queue of requests on the PG that are waiting to be written to
> the journal? Any tips to reduce this?

A delay before "started" is contention of some sort; which one depends
on what state it's blocked in. If it was already in "Reached PG", that
means the PG (or, possibly, some other PG within the same thread
shard) was busy until that point. Earlier on, it might be network
contention or one of the throttles that limits how many uncommitted
ops the OSD will accept at once.

>
> Waiting for Sub Ops = Self-explanatory, its waiting for replica OSD's to
> apply the op to journal
>
> commit_queued_for_journal_write/ write_thread_in_journal_buffer/
> journaled_completion_queued/ op_commit = How long it takes to queue and
> write to the journal. In example case its 4ms....seems very high for s3700
> SSD? Maybe lots of ops are queued up? Most other ops show this <1ms.

Yep to all that. If the journal is taking an unexpected amount of time
it could also have hit a throttle (to keep it from going too far ahead
of the backing store).

>
> sub_op_commit_rec = This is where we hear back from the replica OSD's

Yep.

>
> op_applied/done = We have finished so send ACK back to client

Applied means it's been given to the backing store's filesystem; done
means we actually sent the client ack back into the TCP stack.
-Greg
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com