Re: Ceph RBD/FS top

Jason Dillaman <jdillama@xxxxxxxxxx> · Thu, 11 Apr 2019 09:28:42 -0400

(CCing ceph-devel since I think it's odd to segregate topics to an
unlisted mailing list)

On Thu, Apr 11, 2019 at 9:12 AM Venky Shankar <vshankar@xxxxxxxxxx> wrote:
>
> Hey Jason,
>
> We are working towards bringing in `top` like functionality to CephFS
> for displaying various client (and MDS) metrics. Since RBD has
> something similar in the form of `perf image io*` via rbd cli, we
> would like to understand some finer details regarding its
> implementation and detail how CephFS is going forward for `fs top`
> functionality.
>
> IIUC, the `rbd_support` manager module requests object perf counters
> from the OSD, thereby extracting image names from the returned list of

Technically it extracts the image ids since that's the only thing
encoded in the object name. The "rbd_support" manager module will
lazily translate the image ids back to a real image name as needed.

> hot objects. I'm guessing it's done this way since there is no RBD
> related active daemon to forward metrics data to the manager? OTOH,

It's because we are tracking client IO and we don't have a daemon in
the data path -- the OSDs are the only daemon in the IO path for RBD.

`rbd-mirror` does make use of
> `MgrClient::service_daemon_update_status()` to forward mirror daemon
> status, which seems to be ok for anything that's not too bulky.

It's storing metrics that only it knows about. The good parallel
analogy would be for the MDS to export metrics for things that only it
would know about (e.g. the number of clients or caps, metadata
read/write rates). The "rbd-mirror" daemon stores JSON-encoded
metadata via the "service_daemon_update_status" API, but it also
passes PerfCounter metrics automatically to the MGR (see the usage of
the "rbd_mirror_perf_stats_prio" config option).

> For forwarding CephFS related metrics to Ceph Manager, sticking in
> blobs of metrics data in daemon status doesn't look clean (although it
> might work). Therefore, for CephFS, `MMgrReport` message type is
> expanded to include metrics data as part of its report update process,
> as per:
>
>         https://github.com/ceph/ceph/pull/26004/commits/a75570c0e73ef67bbca8f73a9742e10bb9deb505#diff-b7b92973d97c21398c2be357f6a38b3e

Just my 2 cents, but I think it's awkward to put an MDS-unique data
structure in a generic message. I would think most (if not all) of
your MDS metrics could be passed generically via the PerfCounter
export mechanism.

> ... and a callback function is provided to `MgrClient` (invoked
> periodically) to fill in appropriate metrics data in its report. This
> works well and is similar to how OSD updates PG stats to Ceph Manager.
>
> I guess changes of this nature was not required by RBD as it can get
> the required data by querying the OSDs (and were other approaches
> considered regarding the same)?

You would also want to be able to query client data IO metrics via the
OSDs -- you won't get that data from the MDS, right (unless the
clients are already posting metrics back to the MDS periodically)?

> Thanks,
> -Venky

-- 
Jason