On Mon, Mar 13, 2017 at 9:50 PM, Paul Cuzner <pcuzner@xxxxxxxxxx> wrote: > Fundamentally, the metrics that describe the IO the OSD performs in > response to a recovery operation should be the same as the metrics for > client I/O. Ah, so the key part here I think is "describe the IO that the OSD performs" -- the counters you've been looking at do not do that. They describe the ops the OSD is servicing, *not* the (disk) IO the OSD is doing as a result. That's why you don't get an apples-to-apples comparison between client IO and recovery -- if you were looking at disk IO stats from both, it would be perfectly reasonable to combine/compare them. When you're looking at Ceph's own counters of client ops vs. recovery activity, that no longer makes sense. > So in the context of a recovery operation, one OSD would > report a read (recovery source) and another report a write (recovery > target), together with their corresponding num_bytes. To my mind this > provides transparency, and maybe helps potential automation. Okay, so if we were talking about disk IO counters, this would probably make sense (one read wouldn't necessarily correspond to one write), but if you had a counter that was telling you how many Ceph recovery push/pull ops were "reading" (being sent) vs "writing" (being received) the totals would just be zero. John > > > > > > > On Mon, Mar 13, 2017 at 1:13 AM, John Spray <jspray@xxxxxxxxxx> wrote: >> On Sat, Mar 11, 2017 at 9:24 PM, Paul Cuzner <pcuzner@xxxxxxxxxx> wrote: >>> On Sun, Mar 12, 2017 at 9:49 AM, John Spray <jspray@xxxxxxxxxx> wrote: >>>> On Fri, Mar 10, 2017 at 8:52 PM, Paul Cuzner <pcuzner@xxxxxxxxxx> wrote: >>>>> Thanks John >>>>> >>>>> This is weird then. When I look at the data with client load I see the >>>>> following; >>>>> { >>>>> "pool_name": "default.rgw.buckets.index", >>>>> "pool_id": 94, >>>>> "recovery": {}, >>>>> "recovery_rate": {}, >>>>> "client_io_rate": { >>>>> "read_bytes_sec": 19242365, >>>>> "write_bytes_sec": 0, >>>>> "read_op_per_sec": 12514, >>>>> "write_op_per_sec": 0 >>>>> } >>>>> >>>>> No object related counters - they're all block based. The plugin I >>>>> have rolls-up the block metrics across all pools to provide total >>>>> client load. >>>> >>>> Where are you getting the idea that these counters have to do with >>>> block storage? What Ceph is telling you about here is the number of >>>> operations (or bytes in those operations) being handled by OSDs. >>>> >>> >>> Perhaps it's my poor choice of words - apologies. >>> >>> read_op_per_sec is read IOP count to the OSDs from client activity >>> against the pool >>> >>> My point is that client-io is expressed in these terms, but recovery >>> activity is not. I was hoping that both recovery and client I/O would >>> be reported in the same way so you gain a view of the activity of the >>> system as a whole. I can sum bytes_sec from client i/o with >>> recovery_rate bytes_sec, which is something, but I can't see inside >>> recovery activity to see how much is read or write, or how much IOP >>> load is coming from recovery. >> >> What would it mean to you for a recovery operation (one OSD sending >> some data to another OSD) to be read vs. write? >> >> John -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html