Re: Interpreting ceph osd pool stats output

Paul Cuzner <pcuzner@xxxxxxxxxx> · Tue, 14 Mar 2017 16:13:53 +1300

First of all - thanks John for your patience!

I guess, I still can't get past the different metrics being used -
client I/O is described in one way, recovery in another and yet
fundamentally they both send ops to the OSD's right? To me, what's
interesting is that the recovery_rate metrics from pool stats seems to
be a higher level 'product' of lower level information - for example
recovering_objects_per_sec : is this not a product of multiple
read/write ops to OSD's?

Also, don't get me wrong - the recovery_rate dict is cool and it gives
a great view of object level recovery - I was just hoping for common
metrics for the OSD ops that are shared by client and recovery
activity.

Since this isn't the case, what's the recommended way to determine how
busy a cluster is - across recovery and client (rbd/rgw) requests?

.

On Tue, Mar 14, 2017 at 11:14 AM, John Spray <jspray@xxxxxxxxxx> wrote:
> On Mon, Mar 13, 2017 at 10:13 PM, John Spray <jspray@xxxxxxxxxx> wrote:
>> On Mon, Mar 13, 2017 at 9:50 PM, Paul Cuzner <pcuzner@xxxxxxxxxx> wrote:
>>> Fundamentally, the metrics that describe the IO the OSD performs in
>>> response to a recovery operation should be the same as the metrics for
>>> client I/O.
>>
>> Ah, so the key part here I think is "describe the IO that the OSD
>> performs" -- the counters you've been looking at do not do that.  They
>> describe the ops the OSD is servicing, *not* the (disk) IO the OSD is
>> doing as a result.
>>
>> That's why you don't get an apples-to-apples comparison between client
>> IO and recovery -- if you were looking at disk IO stats from both, it
>> would be perfectly reasonable to combine/compare them.  When you're
>> looking at Ceph's own counters of client ops vs. recovery activity,
>> that no longer makes sense.
>>
>>> So in the context of a recovery operation, one OSD would
>>> report a read (recovery source) and another report a write (recovery
>>> target), together with their corresponding num_bytes. To my mind this
>>> provides transparency, and maybe helps potential automation.
>>
>> Okay, so if we were talking about disk IO counters, this would
>> probably make sense (one read wouldn't necessarily correspond to one
>> write), but if you had a counter that was telling you how many Ceph
>> recovery push/pull ops were "reading" (being sent) vs "writing" (being
>> received) the totals would just be zero.
>
> Sorry, that should have said the totals would just be equal.
>
> John
>
>>
>> John
>>
>>>
>>
>>>
>>>
>>>
>>>
>>>
>>> On Mon, Mar 13, 2017 at 1:13 AM, John Spray <jspray@xxxxxxxxxx> wrote:
>>>> On Sat, Mar 11, 2017 at 9:24 PM, Paul Cuzner <pcuzner@xxxxxxxxxx> wrote:
>>>>> On Sun, Mar 12, 2017 at 9:49 AM, John Spray <jspray@xxxxxxxxxx> wrote:
>>>>>> On Fri, Mar 10, 2017 at 8:52 PM, Paul Cuzner <pcuzner@xxxxxxxxxx> wrote:
>>>>>>> Thanks John
>>>>>>>
>>>>>>> This is weird then. When I look at the data with client load I see the
>>>>>>> following;
>>>>>>> {
>>>>>>> "pool_name": "default.rgw.buckets.index",
>>>>>>> "pool_id": 94,
>>>>>>> "recovery": {},
>>>>>>> "recovery_rate": {},
>>>>>>> "client_io_rate": {
>>>>>>> "read_bytes_sec": 19242365,
>>>>>>> "write_bytes_sec": 0,
>>>>>>> "read_op_per_sec": 12514,
>>>>>>> "write_op_per_sec": 0
>>>>>>> }
>>>>>>>
>>>>>>> No object related counters - they're all block based. The plugin I
>>>>>>> have rolls-up the block metrics across all pools to provide total
>>>>>>> client load.
>>>>>>
>>>>>> Where are you getting the idea that these counters have to do with
>>>>>> block storage?  What Ceph is telling you about here is the number of
>>>>>> operations (or bytes in those operations) being handled by OSDs.
>>>>>>
>>>>>
>>>>> Perhaps it's my poor choice of words - apologies.
>>>>>
>>>>> read_op_per_sec is read IOP count to the OSDs from client activity
>>>>> against the pool
>>>>>
>>>>> My point is that client-io is expressed in these terms, but recovery
>>>>> activity is not. I was hoping that both recovery and client I/O would
>>>>> be reported in the same way so you gain a view of the activity of the
>>>>> system as a whole. I can sum bytes_sec from client i/o with
>>>>> recovery_rate bytes_sec, which is something, but I can't see inside
>>>>> recovery activity to see how much is read or write, or how much IOP
>>>>> load is coming from recovery.
>>>>
>>>> What would it mean to you for a recovery operation (one OSD sending
>>>> some data to another OSD) to be read vs. write?
>>>>
>>>> John
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html