On Tue, 14 Mar 2017, John Spray wrote: > On Tue, Mar 14, 2017 at 3:13 AM, Paul Cuzner <pcuzner@xxxxxxxxxx> wrote: > > First of all - thanks John for your patience! > > > > I guess, I still can't get past the different metrics being used - > > client I/O is described in one way, recovery in another and yet > > fundamentally they both send ops to the OSD's right? To me, what's > > interesting is that the recovery_rate metrics from pool stats seems to > > be a higher level 'product' of lower level information - for example > > recovering_objects_per_sec : is this not a product of multiple > > read/write ops to OSD's? > > While there is data being moved around, it would be misleading to say > it's all just ops. The path that client ops go down is different to > the path that recovery messages go down. Recovery data is gathered up > into big vectors of object extents that are sent between OSDs, client > ops are sent individually from clients. An OSD servicing 10 writes > from 10 different clients is not directly comparable to an OSD > servicing an MOSDPush message from another OSD that happens to contain > updates to 10 objects. > > Client ops are also a logically meaningful to consumers of the > cluster, while the recovery stuff is a total implementation detail. > The implementation of recovery could change any time, and any counter > generated from it will only be meaningful to someone who understands > how recovery works on that particular version of the ceph code. > > > Also, don't get me wrong - the recovery_rate dict is cool and it gives > > a great view of object level recovery - I was just hoping for common > > metrics for the OSD ops that are shared by client and recovery > > activity. > > > > Since this isn't the case, what's the recommended way to determine how > > busy a cluster is - across recovery and client (rbd/rgw) requests? > > I would say again that how busy a cluster is doing it's job (client > IO) is a very separate thing from how busy it is doing internal > housekeeping. Imagine exposing this as a speedometer dial in a GUI > (as people sometimes do) -- a cluster that was killing itself with > recovery and completely blocking it's clients would look like it was > going nice and fast. In my view, exposing two separate numbers is the > right thing to do, not a shortcoming. > > If you truly want to come up with some kind of single metric then you > can: you could take the rate of change of the objects recovered for > example. If you wanted to, you could think of finishing recovery of > one object as an "op". I would tend to think of this as the job of a > higher level tool though, rather than a collectd plugin. Especially > if the collectd plugin is meant to be general purpose, it should avoid > inventing things like this. I think the only other option is to take a measurement at a lower layer. BlueStore doesn't currently but could easily have metrics for bytes read and written. But again, this is a secondary product of client and recovery: a client write, for example, will result in 3 writes across 3 osds (in a 3x replicated pool). sage > > John > > > > > > > > > > > > > > > > > > > > > . > > > > On Tue, Mar 14, 2017 at 11:14 AM, John Spray <jspray@xxxxxxxxxx> wrote: > >> On Mon, Mar 13, 2017 at 10:13 PM, John Spray <jspray@xxxxxxxxxx> wrote: > >>> On Mon, Mar 13, 2017 at 9:50 PM, Paul Cuzner <pcuzner@xxxxxxxxxx> wrote: > >>>> Fundamentally, the metrics that describe the IO the OSD performs in > >>>> response to a recovery operation should be the same as the metrics for > >>>> client I/O. > >>> > >>> Ah, so the key part here I think is "describe the IO that the OSD > >>> performs" -- the counters you've been looking at do not do that. They > >>> describe the ops the OSD is servicing, *not* the (disk) IO the OSD is > >>> doing as a result. > >>> > >>> That's why you don't get an apples-to-apples comparison between client > >>> IO and recovery -- if you were looking at disk IO stats from both, it > >>> would be perfectly reasonable to combine/compare them. When you're > >>> looking at Ceph's own counters of client ops vs. recovery activity, > >>> that no longer makes sense. > >>> > >>>> So in the context of a recovery operation, one OSD would > >>>> report a read (recovery source) and another report a write (recovery > >>>> target), together with their corresponding num_bytes. To my mind this > >>>> provides transparency, and maybe helps potential automation. > >>> > >>> Okay, so if we were talking about disk IO counters, this would > >>> probably make sense (one read wouldn't necessarily correspond to one > >>> write), but if you had a counter that was telling you how many Ceph > >>> recovery push/pull ops were "reading" (being sent) vs "writing" (being > >>> received) the totals would just be zero. > >> > >> Sorry, that should have said the totals would just be equal. > >> > >> John > >> > >>> > >>> John > >>> > >>>> > >>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> On Mon, Mar 13, 2017 at 1:13 AM, John Spray <jspray@xxxxxxxxxx> wrote: > >>>>> On Sat, Mar 11, 2017 at 9:24 PM, Paul Cuzner <pcuzner@xxxxxxxxxx> wrote: > >>>>>> On Sun, Mar 12, 2017 at 9:49 AM, John Spray <jspray@xxxxxxxxxx> wrote: > >>>>>>> On Fri, Mar 10, 2017 at 8:52 PM, Paul Cuzner <pcuzner@xxxxxxxxxx> wrote: > >>>>>>>> Thanks John > >>>>>>>> > >>>>>>>> This is weird then. When I look at the data with client load I see the > >>>>>>>> following; > >>>>>>>> { > >>>>>>>> "pool_name": "default.rgw.buckets.index", > >>>>>>>> "pool_id": 94, > >>>>>>>> "recovery": {}, > >>>>>>>> "recovery_rate": {}, > >>>>>>>> "client_io_rate": { > >>>>>>>> "read_bytes_sec": 19242365, > >>>>>>>> "write_bytes_sec": 0, > >>>>>>>> "read_op_per_sec": 12514, > >>>>>>>> "write_op_per_sec": 0 > >>>>>>>> } > >>>>>>>> > >>>>>>>> No object related counters - they're all block based. The plugin I > >>>>>>>> have rolls-up the block metrics across all pools to provide total > >>>>>>>> client load. > >>>>>>> > >>>>>>> Where are you getting the idea that these counters have to do with > >>>>>>> block storage? What Ceph is telling you about here is the number of > >>>>>>> operations (or bytes in those operations) being handled by OSDs. > >>>>>>> > >>>>>> > >>>>>> Perhaps it's my poor choice of words - apologies. > >>>>>> > >>>>>> read_op_per_sec is read IOP count to the OSDs from client activity > >>>>>> against the pool > >>>>>> > >>>>>> My point is that client-io is expressed in these terms, but recovery > >>>>>> activity is not. I was hoping that both recovery and client I/O would > >>>>>> be reported in the same way so you gain a view of the activity of the > >>>>>> system as a whole. I can sum bytes_sec from client i/o with > >>>>>> recovery_rate bytes_sec, which is something, but I can't see inside > >>>>>> recovery activity to see how much is read or write, or how much IOP > >>>>>> load is coming from recovery. > >>>>> > >>>>> What would it mean to you for a recovery operation (one OSD sending > >>>>> some data to another OSD) to be read vs. write? > >>>>> > >>>>> John > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html