s/i was/i wasn't/ doh...it's late On Mon, Mar 20, 2017 at 9:40 PM, Paul Cuzner <pcuzner@xxxxxxxxxx> wrote: > I was suggesting inventing the data collector - more about how > (formula's etc) and what metrics we aggregate to derive meaningful > metrics. pcp, collectd etc give us a single component - what's the > framework that ties all those pieces together to give us the > cluster-wide view? If there is something out there, great...I'm not a > fan of reinventing the wheel either :) > > > > On Mon, Mar 20, 2017 at 8:54 PM, Brad Hubbard <bhubbard@xxxxxxxxxx> wrote: >> >> >> On Mon, Mar 20, 2017 at 1:57 PM, Paul Cuzner <pcuzner@xxxxxxxxxx> wrote: >>> John/Sage, thanks for the clarification and info. At this stage, I'll >>> stick with the data I have with John's caveats. >>> >>> The challenge in understanding the load going on in a cluster is >>> definitely interesting since the choke points are different depending >>> on whether you look at the cluster through a hardware or software >>> 'lens'. >>> >>> I think the interesting question is how does a customer know how >>> 'full' their cluster is from a performance standpoint - ie. when do I >>> need to buy more or different hardware? Holy grail type stuff :) >>> >>> Is there any work going on in this space, perhaps analyzing the >>> underlying components within the cluster like cpu, ram or disk util >>> rates across the nodes? >> >> Wouldn't this be reinventing the wheel since this is something that things like >> pcp (collectd?) do very well already? >> >>> >>> >>> >>> On Wed, Mar 15, 2017 at 2:13 AM, Sage Weil <sweil@xxxxxxxxxx> wrote: >>>> On Tue, 14 Mar 2017, John Spray wrote: >>>>> On Tue, Mar 14, 2017 at 3:13 AM, Paul Cuzner <pcuzner@xxxxxxxxxx> wrote: >>>>> > First of all - thanks John for your patience! >>>>> > >>>>> > I guess, I still can't get past the different metrics being used - >>>>> > client I/O is described in one way, recovery in another and yet >>>>> > fundamentally they both send ops to the OSD's right? To me, what's >>>>> > interesting is that the recovery_rate metrics from pool stats seems to >>>>> > be a higher level 'product' of lower level information - for example >>>>> > recovering_objects_per_sec : is this not a product of multiple >>>>> > read/write ops to OSD's? >>>>> >>>>> While there is data being moved around, it would be misleading to say >>>>> it's all just ops. The path that client ops go down is different to >>>>> the path that recovery messages go down. Recovery data is gathered up >>>>> into big vectors of object extents that are sent between OSDs, client >>>>> ops are sent individually from clients. An OSD servicing 10 writes >>>>> from 10 different clients is not directly comparable to an OSD >>>>> servicing an MOSDPush message from another OSD that happens to contain >>>>> updates to 10 objects. >>>>> >>>>> Client ops are also a logically meaningful to consumers of the >>>>> cluster, while the recovery stuff is a total implementation detail. >>>>> The implementation of recovery could change any time, and any counter >>>>> generated from it will only be meaningful to someone who understands >>>>> how recovery works on that particular version of the ceph code. >>>>> >>>>> > Also, don't get me wrong - the recovery_rate dict is cool and it gives >>>>> > a great view of object level recovery - I was just hoping for common >>>>> > metrics for the OSD ops that are shared by client and recovery >>>>> > activity. >>>>> > >>>>> > Since this isn't the case, what's the recommended way to determine how >>>>> > busy a cluster is - across recovery and client (rbd/rgw) requests? >>>>> >>>>> I would say again that how busy a cluster is doing it's job (client >>>>> IO) is a very separate thing from how busy it is doing internal >>>>> housekeeping. Imagine exposing this as a speedometer dial in a GUI >>>>> (as people sometimes do) -- a cluster that was killing itself with >>>>> recovery and completely blocking it's clients would look like it was >>>>> going nice and fast. In my view, exposing two separate numbers is the >>>>> right thing to do, not a shortcoming. >>>>> >>>>> If you truly want to come up with some kind of single metric then you >>>>> can: you could take the rate of change of the objects recovered for >>>>> example. If you wanted to, you could think of finishing recovery of >>>>> one object as an "op". I would tend to think of this as the job of a >>>>> higher level tool though, rather than a collectd plugin. Especially >>>>> if the collectd plugin is meant to be general purpose, it should avoid >>>>> inventing things like this. >>>> >>>> I think the only other option is to take a measurement at a lower layer. >>>> BlueStore doesn't currently but could easily have metrics for bytes read >>>> and written. But again, this is a secondary product of client and >>>> recovery: a client write, for example, will result in 3 writes across 3 >>>> osds (in a 3x replicated pool). >>>> >>>> sage >>>> >>>> >>>> > >>>>> John >>>>> >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > . >>>>> > >>>>> > On Tue, Mar 14, 2017 at 11:14 AM, John Spray <jspray@xxxxxxxxxx> wrote: >>>>> >> On Mon, Mar 13, 2017 at 10:13 PM, John Spray <jspray@xxxxxxxxxx> wrote: >>>>> >>> On Mon, Mar 13, 2017 at 9:50 PM, Paul Cuzner <pcuzner@xxxxxxxxxx> wrote: >>>>> >>>> Fundamentally, the metrics that describe the IO the OSD performs in >>>>> >>>> response to a recovery operation should be the same as the metrics for >>>>> >>>> client I/O. >>>>> >>> >>>>> >>> Ah, so the key part here I think is "describe the IO that the OSD >>>>> >>> performs" -- the counters you've been looking at do not do that. They >>>>> >>> describe the ops the OSD is servicing, *not* the (disk) IO the OSD is >>>>> >>> doing as a result. >>>>> >>> >>>>> >>> That's why you don't get an apples-to-apples comparison between client >>>>> >>> IO and recovery -- if you were looking at disk IO stats from both, it >>>>> >>> would be perfectly reasonable to combine/compare them. When you're >>>>> >>> looking at Ceph's own counters of client ops vs. recovery activity, >>>>> >>> that no longer makes sense. >>>>> >>> >>>>> >>>> So in the context of a recovery operation, one OSD would >>>>> >>>> report a read (recovery source) and another report a write (recovery >>>>> >>>> target), together with their corresponding num_bytes. To my mind this >>>>> >>>> provides transparency, and maybe helps potential automation. >>>>> >>> >>>>> >>> Okay, so if we were talking about disk IO counters, this would >>>>> >>> probably make sense (one read wouldn't necessarily correspond to one >>>>> >>> write), but if you had a counter that was telling you how many Ceph >>>>> >>> recovery push/pull ops were "reading" (being sent) vs "writing" (being >>>>> >>> received) the totals would just be zero. >>>>> >> >>>>> >> Sorry, that should have said the totals would just be equal. >>>>> >> >>>>> >> John >>>>> >> >>>>> >>> >>>>> >>> John >>>>> >>> >>>>> >>>> >>>>> >>> >>>>> >>>> >>>>> >>>> >>>>> >>>> >>>>> >>>> >>>>> >>>> >>>>> >>>> On Mon, Mar 13, 2017 at 1:13 AM, John Spray <jspray@xxxxxxxxxx> wrote: >>>>> >>>>> On Sat, Mar 11, 2017 at 9:24 PM, Paul Cuzner <pcuzner@xxxxxxxxxx> wrote: >>>>> >>>>>> On Sun, Mar 12, 2017 at 9:49 AM, John Spray <jspray@xxxxxxxxxx> wrote: >>>>> >>>>>>> On Fri, Mar 10, 2017 at 8:52 PM, Paul Cuzner <pcuzner@xxxxxxxxxx> wrote: >>>>> >>>>>>>> Thanks John >>>>> >>>>>>>> >>>>> >>>>>>>> This is weird then. When I look at the data with client load I see the >>>>> >>>>>>>> following; >>>>> >>>>>>>> { >>>>> >>>>>>>> "pool_name": "default.rgw.buckets.index", >>>>> >>>>>>>> "pool_id": 94, >>>>> >>>>>>>> "recovery": {}, >>>>> >>>>>>>> "recovery_rate": {}, >>>>> >>>>>>>> "client_io_rate": { >>>>> >>>>>>>> "read_bytes_sec": 19242365, >>>>> >>>>>>>> "write_bytes_sec": 0, >>>>> >>>>>>>> "read_op_per_sec": 12514, >>>>> >>>>>>>> "write_op_per_sec": 0 >>>>> >>>>>>>> } >>>>> >>>>>>>> >>>>> >>>>>>>> No object related counters - they're all block based. The plugin I >>>>> >>>>>>>> have rolls-up the block metrics across all pools to provide total >>>>> >>>>>>>> client load. >>>>> >>>>>>> >>>>> >>>>>>> Where are you getting the idea that these counters have to do with >>>>> >>>>>>> block storage? What Ceph is telling you about here is the number of >>>>> >>>>>>> operations (or bytes in those operations) being handled by OSDs. >>>>> >>>>>>> >>>>> >>>>>> >>>>> >>>>>> Perhaps it's my poor choice of words - apologies. >>>>> >>>>>> >>>>> >>>>>> read_op_per_sec is read IOP count to the OSDs from client activity >>>>> >>>>>> against the pool >>>>> >>>>>> >>>>> >>>>>> My point is that client-io is expressed in these terms, but recovery >>>>> >>>>>> activity is not. I was hoping that both recovery and client I/O would >>>>> >>>>>> be reported in the same way so you gain a view of the activity of the >>>>> >>>>>> system as a whole. I can sum bytes_sec from client i/o with >>>>> >>>>>> recovery_rate bytes_sec, which is something, but I can't see inside >>>>> >>>>>> recovery activity to see how much is read or write, or how much IOP >>>>> >>>>>> load is coming from recovery. >>>>> >>>>> >>>>> >>>>> What would it mean to you for a recovery operation (one OSD sending >>>>> >>>>> some data to another OSD) to be read vs. write? >>>>> >>>>> >>>>> >>>>> John >>>>> -- >>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx >>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>> >>>>> >>> -- >>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>> the body of a message to majordomo@xxxxxxxxxxxxxxx >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> >> >> -- >> Cheers, >> Brad -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html