Re: Ceph RBD/FS top

Jason Dillaman <jdillama@xxxxxxxxxx> · Fri, 12 Apr 2019 07:33:02 -0400

(dropped ceph-fs since I just got a "needs approval" bounce from it last time)

On Fri, Apr 12, 2019 at 4:27 AM Venky Shankar <vshankar@xxxxxxxxxx> wrote:
>
> On Thu, Apr 11, 2019 at 6:58 PM Jason Dillaman <jdillama@xxxxxxxxxx> wrote:
> >
> > (CCing ceph-devel since I think it's odd to segregate topics to an
> > unlisted mailing list)
> >
> > On Thu, Apr 11, 2019 at 9:12 AM Venky Shankar <vshankar@xxxxxxxxxx> wrote:
> > >
> > > Hey Jason,
> > >
> > > We are working towards bringing in `top` like functionality to CephFS
> > > for displaying various client (and MDS) metrics. Since RBD has
> > > something similar in the form of `perf image io*` via rbd cli, we
> > > would like to understand some finer details regarding its
> > > implementation and detail how CephFS is going forward for `fs top`
> > > functionality.
> > >
> > > IIUC, the `rbd_support` manager module requests object perf counters
> > > from the OSD, thereby extracting image names from the returned list of
> >
> > Technically it extracts the image ids since that's the only thing
> > encoded in the object name. The "rbd_support" manager module will
> > lazily translate the image ids back to a real image name as needed.
> >
> > > hot objects. I'm guessing it's done this way since there is no RBD
> > > related active daemon to forward metrics data to the manager? OTOH,
> >
> > It's because we are tracking client IO and we don't have a daemon in
> > the data path -- the OSDs are the only daemon in the IO path for RBD.
>
> ACK.
>
> >
> > `rbd-mirror` does make use of
> > > `MgrClient::service_daemon_update_status()` to forward mirror daemon
> > > status, which seems to be ok for anything that's not too bulky.
> >
> > It's storing metrics that only it knows about. The good parallel
> > analogy would be for the MDS to export metrics for things that only it
> > would know about (e.g. the number of clients or caps, metadata
> > read/write rates). The "rbd-mirror" daemon stores JSON-encoded
> > metadata via the "service_daemon_update_status" API, but it also
> > passes PerfCounter metrics automatically to the MGR (see the usage of
> > the "rbd_mirror_perf_stats_prio" config option).
> >
> > > For forwarding CephFS related metrics to Ceph Manager, sticking in
> > > blobs of metrics data in daemon status doesn't look clean (although it
> > > might work). Therefore, for CephFS, `MMgrReport` message type is
> > > expanded to include metrics data as part of its report update process,
> > > as per:
> > >
> > >         https://github.com/ceph/ceph/pull/26004/commits/a75570c0e73ef67bbca8f73a9742e10bb9deb505#diff-b7b92973d97c21398c2be357f6a38b3e
> >
> > Just my 2 cents, but I think it's awkward to put an MDS-unique data
> > structure in a generic message.I would think most (if not all) of
>
> Agreed, that's a bit awkward -- but MMgrReport already has OSD
> specific data in there.

Figured that the OSDs represent the vast majority of daemons in a Ceph
cluster, so they are probably first-tier citizens. We wouldn't want to
go down a road with MDS+RGW+NFS ganasha+iSCSI tcmu-runner+RBD
mirror+... one-offs.

> > your MDS metrics could be passed generically via the PerfCounter
> > export mechanism.
>
> Probably, but that would be just aggregated values, right? We would
> need per-client metrics.

What metrics are you attempting to collect from the client to report
back to the MGR? Does the MDS already have these client metrics? Can
the MDS not just provide its own "MDS command" I/F to query those
metrics a la what "rbd_support" is providing in the MGR?

> >
> > > ... and a callback function is provided to `MgrClient` (invoked
> > > periodically) to fill in appropriate metrics data in its report. This
> > > works well and is similar to how OSD updates PG stats to Ceph Manager.
> > >
> > > I guess changes of this nature was not required by RBD as it can get
> > > the required data by querying the OSDs (and were other approaches
> > > considered regarding the same)?
> >
> > You would also want to be able to query client data IO metrics via the
> > OSDs -- you won't get that data from the MDS, right (unless the
> > clients are already posting metrics back to the MDS periodically)?
>
> I haven't gotten into figuring out (yet) the set of metrics that can
> be fetched via the OSD.

IO performance (throughout/latency)?

> Clients would post to the MDS data that it's aware of such as its
> cache utilization, OSDC buffer readahed usage, etc..

To me, those doesn't sound like "iotop"-like perform metrics, but
instead like client-state.

> >
> > > Thanks,
> > > -Venky
> >
> > --
> > Jason
>
>
>
> --
>     Venky

-- 
Jason