On Fri, Apr 12, 2019 at 4:33 AM Jason Dillaman <jdillama@xxxxxxxxxx> wrote: > > (dropped ceph-fs since I just got a "needs approval" bounce from it last time) > > On Fri, Apr 12, 2019 at 4:27 AM Venky Shankar <vshankar@xxxxxxxxxx> wrote: > > > > On Thu, Apr 11, 2019 at 6:58 PM Jason Dillaman <jdillama@xxxxxxxxxx> wrote: > > > > > > (CCing ceph-devel since I think it's odd to segregate topics to an > > > unlisted mailing list) > > > > > > On Thu, Apr 11, 2019 at 9:12 AM Venky Shankar <vshankar@xxxxxxxxxx> wrote: > > > > > > > > Hey Jason, > > > > > > > > We are working towards bringing in `top` like functionality to CephFS > > > > for displaying various client (and MDS) metrics. Since RBD has > > > > something similar in the form of `perf image io*` via rbd cli, we > > > > would like to understand some finer details regarding its > > > > implementation and detail how CephFS is going forward for `fs top` > > > > functionality. > > > > > > > > IIUC, the `rbd_support` manager module requests object perf counters > > > > from the OSD, thereby extracting image names from the returned list of > > > > > > Technically it extracts the image ids since that's the only thing > > > encoded in the object name. The "rbd_support" manager module will > > > lazily translate the image ids back to a real image name as needed. > > > > > > > hot objects. I'm guessing it's done this way since there is no RBD > > > > related active daemon to forward metrics data to the manager? OTOH, > > > > > > It's because we are tracking client IO and we don't have a daemon in > > > the data path -- the OSDs are the only daemon in the IO path for RBD. > > > > ACK. > > > > > > > > `rbd-mirror` does make use of > > > > `MgrClient::service_daemon_update_status()` to forward mirror daemon > > > > status, which seems to be ok for anything that's not too bulky. > > > > > > It's storing metrics that only it knows about. The good parallel > > > analogy would be for the MDS to export metrics for things that only it > > > would know about (e.g. the number of clients or caps, metadata > > > read/write rates). The "rbd-mirror" daemon stores JSON-encoded > > > metadata via the "service_daemon_update_status" API, but it also > > > passes PerfCounter metrics automatically to the MGR (see the usage of > > > the "rbd_mirror_perf_stats_prio" config option). > > > > > > > For forwarding CephFS related metrics to Ceph Manager, sticking in > > > > blobs of metrics data in daemon status doesn't look clean (although it > > > > might work). Therefore, for CephFS, `MMgrReport` message type is > > > > expanded to include metrics data as part of its report update process, > > > > as per: > > > > > > > > https://github.com/ceph/ceph/pull/26004/commits/a75570c0e73ef67bbca8f73a9742e10bb9deb505#diff-b7b92973d97c21398c2be357f6a38b3e > > > > > > Just my 2 cents, but I think it's awkward to put an MDS-unique data > > > structure in a generic message.I would think most (if not all) of > > > > Agreed, that's a bit awkward -- but MMgrReport already has OSD > > specific data in there. > > Figured that the OSDs represent the vast majority of daemons in a Ceph > cluster, so they are probably first-tier citizens. We wouldn't want to > go down a road with MDS+RGW+NFS ganasha+iSCSI tcmu-runner+RBD > mirror+... one-offs. There's already a "service" concept and we have a ServiceMap in the manager, right? I'm not sure in what ways that is extensible, but I believe it was built to satisfy RGW and Ganesha's needs and should handle the iSCSI daemons etc. That said, the MDS is definitely in a different category. Unlike those others, it has its own monitor-based maps and clusters within the Ceph ecosystem, is a sink for *Ceph client* IOs as well as a source of them to the OSDs, and is generally a first-tier daemon. If its needs don't fit within the generic service framework it's perfectly reasonable to give the MDS its own data structures for reporting. -Greg