On Mon, Mar 13, 2017 at 10:13 PM, John Spray <jspray@xxxxxxxxxx> wrote: > On Mon, Mar 13, 2017 at 9:50 PM, Paul Cuzner <pcuzner@xxxxxxxxxx> wrote: >> Fundamentally, the metrics that describe the IO the OSD performs in >> response to a recovery operation should be the same as the metrics for >> client I/O. > > Ah, so the key part here I think is "describe the IO that the OSD > performs" -- the counters you've been looking at do not do that. They > describe the ops the OSD is servicing, *not* the (disk) IO the OSD is > doing as a result. > > That's why you don't get an apples-to-apples comparison between client > IO and recovery -- if you were looking at disk IO stats from both, it > would be perfectly reasonable to combine/compare them. When you're > looking at Ceph's own counters of client ops vs. recovery activity, > that no longer makes sense. > >> So in the context of a recovery operation, one OSD would >> report a read (recovery source) and another report a write (recovery >> target), together with their corresponding num_bytes. To my mind this >> provides transparency, and maybe helps potential automation. > > Okay, so if we were talking about disk IO counters, this would > probably make sense (one read wouldn't necessarily correspond to one > write), but if you had a counter that was telling you how many Ceph > recovery push/pull ops were "reading" (being sent) vs "writing" (being > received) the totals would just be zero. Sorry, that should have said the totals would just be equal. John > > John > >> > >> >> >> >> >> >> On Mon, Mar 13, 2017 at 1:13 AM, John Spray <jspray@xxxxxxxxxx> wrote: >>> On Sat, Mar 11, 2017 at 9:24 PM, Paul Cuzner <pcuzner@xxxxxxxxxx> wrote: >>>> On Sun, Mar 12, 2017 at 9:49 AM, John Spray <jspray@xxxxxxxxxx> wrote: >>>>> On Fri, Mar 10, 2017 at 8:52 PM, Paul Cuzner <pcuzner@xxxxxxxxxx> wrote: >>>>>> Thanks John >>>>>> >>>>>> This is weird then. When I look at the data with client load I see the >>>>>> following; >>>>>> { >>>>>> "pool_name": "default.rgw.buckets.index", >>>>>> "pool_id": 94, >>>>>> "recovery": {}, >>>>>> "recovery_rate": {}, >>>>>> "client_io_rate": { >>>>>> "read_bytes_sec": 19242365, >>>>>> "write_bytes_sec": 0, >>>>>> "read_op_per_sec": 12514, >>>>>> "write_op_per_sec": 0 >>>>>> } >>>>>> >>>>>> No object related counters - they're all block based. The plugin I >>>>>> have rolls-up the block metrics across all pools to provide total >>>>>> client load. >>>>> >>>>> Where are you getting the idea that these counters have to do with >>>>> block storage? What Ceph is telling you about here is the number of >>>>> operations (or bytes in those operations) being handled by OSDs. >>>>> >>>> >>>> Perhaps it's my poor choice of words - apologies. >>>> >>>> read_op_per_sec is read IOP count to the OSDs from client activity >>>> against the pool >>>> >>>> My point is that client-io is expressed in these terms, but recovery >>>> activity is not. I was hoping that both recovery and client I/O would >>>> be reported in the same way so you gain a view of the activity of the >>>> system as a whole. I can sum bytes_sec from client i/o with >>>> recovery_rate bytes_sec, which is something, but I can't see inside >>>> recovery activity to see how much is read or write, or how much IOP >>>> load is coming from recovery. >>> >>> What would it mean to you for a recovery operation (one OSD sending >>> some data to another OSD) to be read vs. write? >>> >>> John -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html