Re: Ceph RBD/FS top

Gregory Farnum <gfarnum@xxxxxxxxxx> · Fri, 12 Apr 2019 14:14:06 -0700



On Fri, Apr 12, 2019 at 4:33 AM Jason Dillaman <jdillama@xxxxxxxxxx> wrote:
>
> (dropped ceph-fs since I just got a "needs approval" bounce from it last time)
>
> On Fri, Apr 12, 2019 at 4:27 AM Venky Shankar <vshankar@xxxxxxxxxx> wrote:
> >
> > On Thu, Apr 11, 2019 at 6:58 PM Jason Dillaman <jdillama@xxxxxxxxxx> wrote:
> > >
> > > (CCing ceph-devel since I think it's odd to segregate topics to an
> > > unlisted mailing list)
> > >
> > > On Thu, Apr 11, 2019 at 9:12 AM Venky Shankar <vshankar@xxxxxxxxxx> wrote:
> > > >
> > > > Hey Jason,
> > > >
> > > > We are working towards bringing in `top` like functionality to CephFS
> > > > for displaying various client (and MDS) metrics. Since RBD has
> > > > something similar in the form of `perf image io*` via rbd cli, we
> > > > would like to understand some finer details regarding its
> > > > implementation and detail how CephFS is going forward for `fs top`
> > > > functionality.
> > > >
> > > > IIUC, the `rbd_support` manager module requests object perf counters
> > > > from the OSD, thereby extracting image names from the returned list of
> > >
> > > Technically it extracts the image ids since that's the only thing
> > > encoded in the object name. The "rbd_support" manager module will
> > > lazily translate the image ids back to a real image name as needed.
> > >
> > > > hot objects. I'm guessing it's done this way since there is no RBD
> > > > related active daemon to forward metrics data to the manager? OTOH,
> > >
> > > It's because we are tracking client IO and we don't have a daemon in
> > > the data path -- the OSDs are the only daemon in the IO path for RBD.
> >
> > ACK.
> >
> > >
> > > `rbd-mirror` does make use of
> > > > `MgrClient::service_daemon_update_status()` to forward mirror daemon
> > > > status, which seems to be ok for anything that's not too bulky.
> > >
> > > It's storing metrics that only it knows about. The good parallel
> > > analogy would be for the MDS to export metrics for things that only it
> > > would know about (e.g. the number of clients or caps, metadata
> > > read/write rates). The "rbd-mirror" daemon stores JSON-encoded
> > > metadata via the "service_daemon_update_status" API, but it also
> > > passes PerfCounter metrics automatically to the MGR (see the usage of
> > > the "rbd_mirror_perf_stats_prio" config option).
> > >
> > > > For forwarding CephFS related metrics to Ceph Manager, sticking in
> > > > blobs of metrics data in daemon status doesn't look clean (although it
> > > > might work). Therefore, for CephFS, `MMgrReport` message type is
> > > > expanded to include metrics data as part of its report update process,
> > > > as per:
> > > >
> > > >         https://github.com/ceph/ceph/pull/26004/commits/a75570c0e73ef67bbca8f73a9742e10bb9deb505#diff-b7b92973d97c21398c2be357f6a38b3e
> > >
> > > Just my 2 cents, but I think it's awkward to put an MDS-unique data
> > > structure in a generic message.I would think most (if not all) of
> >
> > Agreed, that's a bit awkward -- but MMgrReport already has OSD
> > specific data in there.
>
> Figured that the OSDs represent the vast majority of daemons in a Ceph
> cluster, so they are probably first-tier citizens. We wouldn't want to
> go down a road with MDS+RGW+NFS ganasha+iSCSI tcmu-runner+RBD
> mirror+... one-offs.

There's already a "service" concept and we have a ServiceMap in the
manager, right? I'm not sure in what ways that is extensible, but I
believe it was built to satisfy RGW and Ganesha's needs and should
handle the iSCSI daemons etc.
That said, the MDS is definitely in a different category. Unlike those
others, it has its own monitor-based maps and clusters within the Ceph
ecosystem, is a sink for *Ceph client* IOs as well as a source of them
to the OSDs, and is generally a first-tier daemon. If its needs don't
fit within the generic service framework it's perfectly reasonable to
give the MDS its own data structures for reporting.
-Greg