Re: Ceph RBD/FS top

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Apr 12, 2019 at 5:14 PM Gregory Farnum <gfarnum@xxxxxxxxxx> wrote:
>
> On Fri, Apr 12, 2019 at 4:33 AM Jason Dillaman <jdillama@xxxxxxxxxx> wrote:
> >
> > (dropped ceph-fs since I just got a "needs approval" bounce from it last time)
> >
> > On Fri, Apr 12, 2019 at 4:27 AM Venky Shankar <vshankar@xxxxxxxxxx> wrote:
> > >
> > > On Thu, Apr 11, 2019 at 6:58 PM Jason Dillaman <jdillama@xxxxxxxxxx> wrote:
> > > >
> > > > (CCing ceph-devel since I think it's odd to segregate topics to an
> > > > unlisted mailing list)
> > > >
> > > > On Thu, Apr 11, 2019 at 9:12 AM Venky Shankar <vshankar@xxxxxxxxxx> wrote:
> > > > >
> > > > > Hey Jason,
> > > > >
> > > > > We are working towards bringing in `top` like functionality to CephFS
> > > > > for displaying various client (and MDS) metrics. Since RBD has
> > > > > something similar in the form of `perf image io*` via rbd cli, we
> > > > > would like to understand some finer details regarding its
> > > > > implementation and detail how CephFS is going forward for `fs top`
> > > > > functionality.
> > > > >
> > > > > IIUC, the `rbd_support` manager module requests object perf counters
> > > > > from the OSD, thereby extracting image names from the returned list of
> > > >
> > > > Technically it extracts the image ids since that's the only thing
> > > > encoded in the object name. The "rbd_support" manager module will
> > > > lazily translate the image ids back to a real image name as needed.
> > > >
> > > > > hot objects. I'm guessing it's done this way since there is no RBD
> > > > > related active daemon to forward metrics data to the manager? OTOH,
> > > >
> > > > It's because we are tracking client IO and we don't have a daemon in
> > > > the data path -- the OSDs are the only daemon in the IO path for RBD.
> > >
> > > ACK.
> > >
> > > >
> > > > `rbd-mirror` does make use of
> > > > > `MgrClient::service_daemon_update_status()` to forward mirror daemon
> > > > > status, which seems to be ok for anything that's not too bulky.
> > > >
> > > > It's storing metrics that only it knows about. The good parallel
> > > > analogy would be for the MDS to export metrics for things that only it
> > > > would know about (e.g. the number of clients or caps, metadata
> > > > read/write rates). The "rbd-mirror" daemon stores JSON-encoded
> > > > metadata via the "service_daemon_update_status" API, but it also
> > > > passes PerfCounter metrics automatically to the MGR (see the usage of
> > > > the "rbd_mirror_perf_stats_prio" config option).
> > > >
> > > > > For forwarding CephFS related metrics to Ceph Manager, sticking in
> > > > > blobs of metrics data in daemon status doesn't look clean (although it
> > > > > might work). Therefore, for CephFS, `MMgrReport` message type is
> > > > > expanded to include metrics data as part of its report update process,
> > > > > as per:
> > > > >
> > > > >         https://github.com/ceph/ceph/pull/26004/commits/a75570c0e73ef67bbca8f73a9742e10bb9deb505#diff-b7b92973d97c21398c2be357f6a38b3e
> > > >
> > > > Just my 2 cents, but I think it's awkward to put an MDS-unique data
> > > > structure in a generic message.I would think most (if not all) of
> > >
> > > Agreed, that's a bit awkward -- but MMgrReport already has OSD
> > > specific data in there.
> >
> > Figured that the OSDs represent the vast majority of daemons in a Ceph
> > cluster, so they are probably first-tier citizens. We wouldn't want to
> > go down a road with MDS+RGW+NFS ganasha+iSCSI tcmu-runner+RBD
> > mirror+... one-offs.
>
> There's already a "service" concept and we have a ServiceMap in the
> manager, right? I'm not sure in what ways that is extensible, but I
> believe it was built to satisfy RGW and Ganesha's needs and should
> handle the iSCSI daemons etc.
> That said, the MDS is definitely in a different category. Unlike those
> others, it has its own monitor-based maps and clusters within the Ceph
> ecosystem, is a sink for *Ceph client* IOs as well as a source of them
> to the OSDs, and is generally a first-tier daemon. If its needs don't
> fit within the generic service framework it's perfectly reasonable to
> give the MDS its own data structures for reporting.
> -Greg

I guess that is getting back to my point. If the MDS already has this
information, why does it need to also duplicate this information in
the MGR? The rationale for the OSDs sending their client-level
statistics to the MGR is because the MGR needs to aggregate them into
a usable form. If the MDS already has the data, why can't any "fs top"
utility just query the data from the handful of MDS servers?

Again, just my 2 cents but it all ties back to my option that the MGR
is becoming a single-point-of-failure for Ceph since it doesn't
horizontally scale its workload. If we can avoid sending data to the
MGR if we don't need to, it would seem like a good thing.

-- 
Jason



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux