On Fri, Apr 12, 2019 at 5:42 AM Venky Shankar <vshankar@xxxxxxxxxx> wrote: > > On Fri, Apr 12, 2019 at 5:03 PM Jason Dillaman <jdillama@xxxxxxxxxx> wrote: > > > > (dropped ceph-fs since I just got a "needs approval" bounce from it last time) > > > > On Fri, Apr 12, 2019 at 4:27 AM Venky Shankar <vshankar@xxxxxxxxxx> wrote: > > > > > > On Thu, Apr 11, 2019 at 6:58 PM Jason Dillaman <jdillama@xxxxxxxxxx> wrote: > > > > > > > > (CCing ceph-devel since I think it's odd to segregate topics to an > > > > unlisted mailing list) > > > > > > > > On Thu, Apr 11, 2019 at 9:12 AM Venky Shankar <vshankar@xxxxxxxxxx> wrote: > > > > > > > > > > Hey Jason, > > > > > > > > > > We are working towards bringing in `top` like functionality to CephFS > > > > > for displaying various client (and MDS) metrics. Since RBD has > > > > > something similar in the form of `perf image io*` via rbd cli, we > > > > > would like to understand some finer details regarding its > > > > > implementation and detail how CephFS is going forward for `fs top` > > > > > functionality. > > > > > > > > > > IIUC, the `rbd_support` manager module requests object perf counters > > > > > from the OSD, thereby extracting image names from the returned list of > > > > > > > > Technically it extracts the image ids since that's the only thing > > > > encoded in the object name. The "rbd_support" manager module will > > > > lazily translate the image ids back to a real image name as needed. > > > > > > > > > hot objects. I'm guessing it's done this way since there is no RBD > > > > > related active daemon to forward metrics data to the manager? OTOH, > > > > > > > > It's because we are tracking client IO and we don't have a daemon in > > > > the data path -- the OSDs are the only daemon in the IO path for RBD. > > > > > > ACK. > > > > > > > > > > > `rbd-mirror` does make use of > > > > > `MgrClient::service_daemon_update_status()` to forward mirror daemon > > > > > status, which seems to be ok for anything that's not too bulky. > > > > > > > > It's storing metrics that only it knows about. The good parallel > > > > analogy would be for the MDS to export metrics for things that only it > > > > would know about (e.g. the number of clients or caps, metadata > > > > read/write rates). The "rbd-mirror" daemon stores JSON-encoded > > > > metadata via the "service_daemon_update_status" API, but it also > > > > passes PerfCounter metrics automatically to the MGR (see the usage of > > > > the "rbd_mirror_perf_stats_prio" config option). > > > > > > > > > For forwarding CephFS related metrics to Ceph Manager, sticking in > > > > > blobs of metrics data in daemon status doesn't look clean (although it > > > > > might work). Therefore, for CephFS, `MMgrReport` message type is > > > > > expanded to include metrics data as part of its report update process, > > > > > as per: > > > > > > > > > > https://github.com/ceph/ceph/pull/26004/commits/a75570c0e73ef67bbca8f73a9742e10bb9deb505#diff-b7b92973d97c21398c2be357f6a38b3e > > > > > > > > Just my 2 cents, but I think it's awkward to put an MDS-unique data > > > > structure in a generic message.I would think most (if not all) of > > > > > > Agreed, that's a bit awkward -- but MMgrReport already has OSD > > > specific data in there. > > > > Figured that the OSDs represent the vast majority of daemons in a Ceph > > cluster, so they are probably first-tier citizens. We wouldn't want to > > go down a road with MDS+RGW+NFS ganasha+iSCSI tcmu-runner+RBD > > mirror+... one-offs. > > True -- don't want to pollute generic message types with daemon > specific data. As you mentioned, OSD is probably an exception. > > Or, generalize it to support MDS (and other daemons when needed). Another option is free-form JSON that is delivered (?) to a particular mgr module. > > > > your MDS metrics could be passed generically via the PerfCounter > > > > export mechanism. > > > > > > Probably, but that would be just aggregated values, right? We would > > > need per-client metrics. > > > > What metrics are you attempting to collect from the client to report > > back to the MGR? > > pretty basic as of now: > - client capability hits > - OSDC cache hits, readahead util > > along with a snapshot of all sessions w/ per-session stats. > > Does the MDS already have these client metrics? Can > > the MDS not just provide its own "MDS command" I/F to query those > > metrics a la what "rbd_support" is providing in the MGR? > > That's where I was coming to -- MDS (rank 0) would have all the > metrics that would be shown as part of "top". The MGR can poll the MDS > for client metadata and only poll the session list if it sees a > client-id in OSD stat that it doesn't know about. Polling doesn't feel like the right approach here. The MDS should just periodically forward all of these statistics. I also don't see why we need the OSDs involved. > I'm thinking if forwarding data to the manager would bring benefit in > the form of caching, etc.. done by MGR. It also allows the mgr to present the data in the form of graphs on the dashboard. As suggested elsewhere, I don't think having some script talk to the MDS to present a CephFS iotop is the way to go. For better or worse, the mgr is where we handle cluster-wide performance metadata. -- Patrick Donnelly