Re: [PATCH resend v5 08/11] ceph: periodically send perf metrics to MDS

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 2020-02-06 at 20:26 +0800, Xiubo Li wrote:
> On 2020/2/6 19:31, Jeff Layton wrote:
> > On Thu, 2020-02-06 at 10:36 +0800, Xiubo Li wrote:
> > > On 2020/2/6 5:43, Jeff Layton wrote:
> > > > On Wed, 2020-01-29 at 03:27 -0500, xiubli@xxxxxxxxxx wrote:
> > > [...]
> > > > > +
> > > > > +static int sending_metrics_get(void *data, u64 *val)
> > > > > +{
> > > > > +	struct ceph_fs_client *fsc = (struct ceph_fs_client *)data;
> > > > > +	struct ceph_mds_client *mdsc = fsc->mdsc;
> > > > > +
> > > > > +	mutex_lock(&mdsc->mutex);
> > > > > +	*val = (u64)mdsc->sending_metrics;
> > > > > +	mutex_unlock(&mdsc->mutex);
> > > > > +
> > > > > +	return 0;
> > > > > +}
> > > > > +DEFINE_SIMPLE_ATTRIBUTE(sending_metrics_fops, sending_metrics_get,
> > > > > +			sending_metrics_set, "%llu\n");
> > > > > +
> > > > I'd like to hear more about how we expect users to use this facility.
> > > > This debugfs file doesn't seem consistent with the rest of the UI, and I
> > > > imagine if the box reboots you'd have to (manually) re-enable it after
> > > > mount, right? Maybe this should be a mount option instead?
> > > A mount option means we must do the unmounting to disable it.
> > > 
> > Technically, no. You could wire it up so that you could enable and
> > disable it via -o remount. For example:
> > 
> >      # mount -o remount,metrics=disabled
> 
> Yeah, this is cool.
> 
> > Another option might be a module parameter if this is something that you
> > really want to be global (and not per-mount or per-session).
> > 
> > > I was thinking with the debugfs file we can do the debug or tuning even
> > > in the product setups at any time, usually this should be disabled since
> > > it will send it per second.
> > > 
> > Meh, one frame per second doesn't seem like it'll add much overhead.
> Okay.
> > Also, why one update per second? Should that interval be tunable?
> 
> Per second just keep it the same with the fuse client.
> 

Ok.

> 
> > > Or we could merge the "sending_metric" to "metrics" UI, just writing
> > > "enable"/"disable" to enable/disable sending the metrics to ceph, and
> > > just like the "reset" does to clean the metrics.
> > > 
> > > Then the "/sys/kernel/debug/ceph/XXX.clientYYY/metrics" could be
> > > writable with:
> > > 
> > > "reset"  --> to clean and reset the metrics counters
> > > 
> > > "enable" --> enable sending metrics to ceph cluster
> > > 
> > > "disable" --> disable sending metrics to ceph cluster
> > > 
> > > Will this be better ?
> > > 
> > I guess it's not clear to me how you intend for this to be used.
> > 
> > A debugfs switch means that this is being enabled and disabled on a per-
> > session basis. Is the user supposed to turn this on for all, or just one
> > session? How do they know?
> 
> Not for all, just per-superblock.
> 

If it's per-superblock, then a debugfs-based switch seems particularly
ill-suited for this, as that's really a per-session interface.

> > Is this something we expect people to just turn on briefly when they are
> > experiencing a problem, or is this something that we expect to be turned
> > on and left on for long periods of time?
> 
> If this won't add much overhead even per second, let's keep sending the 
> metrics to ceph always and the mount option for this switch is not 
> needed any more.
> 

Note that I don't really _know_ that it won't be a problem, just that it
doesn't sound too bad. I think we probably will want some mechanism to
enable/disable this until we have some experience with it in the field.

> And there is already a switch to enable/disable showing the metrics in 
> the ceph side, if here add another switch per client, it will be also 
> yucky for admins .
> 
> Let's make the update interval tunable and per second as default. Maybe 
> we should make this as a global UI for all clients ?
> 

If you want a global setting for the interval that would take effect on
all ceph mounts, then maybe a "metric_send_interval" module parameter
would be best. Make it an unsigned int, and allow the admin to set it to
0 to turn off stats transmission in the client.

We have a well-defined interface for setting module parameters on most
distros (via /etc/modprobe.d/), so that would be better than monkeying
around with debugfs here, IMO.

As to the default, it might be best to have this default to 0 initially.
Once we have more experience with it we could make it default to 1 in a
later release.

-- 
Jeff Layton <jlayton@xxxxxxxxxx>




[Index of Archives]     [CEPH Users]     [Ceph Large]     [Ceph Dev]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux