Re: Ceph RBD/FS top

Venky Shankar <vshankar@xxxxxxxxxx> · Fri, 12 Apr 2019 18:11:36 +0530

On Fri, Apr 12, 2019 at 5:03 PM Jason Dillaman <jdillama@xxxxxxxxxx> wrote:
>
> (dropped ceph-fs since I just got a "needs approval" bounce from it last time)
>
> On Fri, Apr 12, 2019 at 4:27 AM Venky Shankar <vshankar@xxxxxxxxxx> wrote:
> >
> > On Thu, Apr 11, 2019 at 6:58 PM Jason Dillaman <jdillama@xxxxxxxxxx> wrote:
> > >
> > > (CCing ceph-devel since I think it's odd to segregate topics to an
> > > unlisted mailing list)
> > >
> > > On Thu, Apr 11, 2019 at 9:12 AM Venky Shankar <vshankar@xxxxxxxxxx> wrote:
> > > >
> > > > Hey Jason,
> > > >
> > > > We are working towards bringing in `top` like functionality to CephFS
> > > > for displaying various client (and MDS) metrics. Since RBD has
> > > > something similar in the form of `perf image io*` via rbd cli, we
> > > > would like to understand some finer details regarding its
> > > > implementation and detail how CephFS is going forward for `fs top`
> > > > functionality.
> > > >
> > > > IIUC, the `rbd_support` manager module requests object perf counters
> > > > from the OSD, thereby extracting image names from the returned list of
> > >
> > > Technically it extracts the image ids since that's the only thing
> > > encoded in the object name. The "rbd_support" manager module will
> > > lazily translate the image ids back to a real image name as needed.
> > >
> > > > hot objects. I'm guessing it's done this way since there is no RBD
> > > > related active daemon to forward metrics data to the manager? OTOH,
> > >
> > > It's because we are tracking client IO and we don't have a daemon in
> > > the data path -- the OSDs are the only daemon in the IO path for RBD.
> >
> > ACK.
> >
> > >
> > > `rbd-mirror` does make use of
> > > > `MgrClient::service_daemon_update_status()` to forward mirror daemon
> > > > status, which seems to be ok for anything that's not too bulky.
> > >
> > > It's storing metrics that only it knows about. The good parallel
> > > analogy would be for the MDS to export metrics for things that only it
> > > would know about (e.g. the number of clients or caps, metadata
> > > read/write rates). The "rbd-mirror" daemon stores JSON-encoded
> > > metadata via the "service_daemon_update_status" API, but it also
> > > passes PerfCounter metrics automatically to the MGR (see the usage of
> > > the "rbd_mirror_perf_stats_prio" config option).
> > >
> > > > For forwarding CephFS related metrics to Ceph Manager, sticking in
> > > > blobs of metrics data in daemon status doesn't look clean (although it
> > > > might work). Therefore, for CephFS, `MMgrReport` message type is
> > > > expanded to include metrics data as part of its report update process,
> > > > as per:
> > > >
> > > >         https://github.com/ceph/ceph/pull/26004/commits/a75570c0e73ef67bbca8f73a9742e10bb9deb505#diff-b7b92973d97c21398c2be357f6a38b3e
> > >
> > > Just my 2 cents, but I think it's awkward to put an MDS-unique data
> > > structure in a generic message.I would think most (if not all) of
> >
> > Agreed, that's a bit awkward -- but MMgrReport already has OSD
> > specific data in there.
>
> Figured that the OSDs represent the vast majority of daemons in a Ceph
> cluster, so they are probably first-tier citizens. We wouldn't want to
> go down a road with MDS+RGW+NFS ganasha+iSCSI tcmu-runner+RBD
> mirror+... one-offs.

True -- don't want to pollute generic message types with daemon
specific data. As you mentioned, OSD is probably an exception.

Or, generalize it to support MDS (and other daemons when needed).

>
> > > your MDS metrics could be passed generically via the PerfCounter
> > > export mechanism.
> >
> > Probably, but that would be just aggregated values, right? We would
> > need per-client metrics.
>
> What metrics are you attempting to collect from the client to report
> back to the MGR?

pretty basic as of now:
- client capability hits
- OSDC cache hits, readahead util

along with a snapshot of all sessions w/ per-session stats.

Does the MDS already have these client metrics? Can
> the MDS not just provide its own "MDS command" I/F to query those
> metrics a la what "rbd_support" is providing in the MGR?

That's where I was coming to -- MDS (rank 0) would have all the
metrics that would be shown as part of "top". The MGR can poll the MDS
for client metadata and only poll the session list if it sees a
client-id in OSD stat that it doesn't know about.

I'm thinking if forwarding data to the manager would bring benefit in
the form of caching, etc.. done by MGR.

>
> > >
> > > > ... and a callback function is provided to `MgrClient` (invoked
> > > > periodically) to fill in appropriate metrics data in its report. This
> > > > works well and is similar to how OSD updates PG stats to Ceph Manager.
> > > >
> > > > I guess changes of this nature was not required by RBD as it can get
> > > > the required data by querying the OSDs (and were other approaches
> > > > considered regarding the same)?
> > >
> > > You would also want to be able to query client data IO metrics via the
> > > OSDs -- you won't get that data from the MDS, right (unless the
> > > clients are already posting metrics back to the MDS periodically)?
> >
> > I haven't gotten into figuring out (yet) the set of metrics that can
> > be fetched via the OSD.
>
> IO performance (throughout/latency)?
>
> > Clients would post to the MDS data that it's aware of such as its
> > cache utilization, OSDC buffer readahed usage, etc..
>
> To me, those doesn't sound like "iotop"-like perform metrics, but
> instead like client-state.

By default, closely resembling "iotop" would be good. But there is
benefit in putting out "fs specific" metrics (such as "cap hit rate"
-- a measure of client capability caching).

>
> > >
> > > > Thanks,
> > > > -Venky
> > >
> > > --
> > > Jason
> >
> >
> >
> > --
> >     Venky
>
> --
> Jason

-- 
    Venky