Re: [PATCH 1/2] ceph: periodically send perf metrics to ceph

Xiubo Li <xiubli@xxxxxxxxxx> · Wed, 17 Jun 2020 08:28:30 +0800

On 2020/6/17 2:58, Jeff Layton wrote:
On Wed, 2020-06-17 at 02:14 +0800, Xiubo Li wrote:
On 2020/6/16 21:40, Jeff Layton wrote:
On Tue, 2020-06-16 at 08:52 -0400,xiubli@xxxxxxxxxx  wrote:
From: Xiubo Li<xiubli@xxxxxxxxxx>

This will send the caps/read/write/metadata metrics to any available
MDS only once per second as default, which will be the same as the
userland client, or every metric_send_interval seconds, which is a
module parameter.

URL:https://tracker.ceph.com/issues/43215
Signed-off-by: Xiubo Li<xiubli@xxxxxxxxxx>
---
   fs/ceph/mds_client.c         |  46 +++++++++------
   fs/ceph/mds_client.h         |   4 ++
   fs/ceph/metric.c             | 133 +++++++++++++++++++++++++++++++++++++++++++
   fs/ceph/metric.h             |  78 +++++++++++++++++++++++++
   fs/ceph/super.c              |  29 ++++++++++
   include/linux/ceph/ceph_fs.h |   1 +
   6 files changed, 274 insertions(+), 17 deletions(-)

Long patch! Might not hurt to break out some of the cleanups into
separate patches, but they're fairly straighforward so I won't require
it.
Sure, I will split the patch into small ones if possible.


[...]

   /* This is the global metrics */
   struct ceph_client_metric {
   	atomic64_t            total_dentries;
@@ -35,8 +100,21 @@ struct ceph_client_metric {
   	ktime_t metadata_latency_sq_sum;
   	ktime_t metadata_latency_min;
   	ktime_t metadata_latency_max;
+
+	struct delayed_work delayed_work;  /* delayed work */
   };
   
+static inline void metric_schedule_delayed(struct ceph_client_metric *m)
+{
+	/* per second as default */
+	unsigned int hz = round_jiffies_relative(HZ * metric_send_interval);
+
+	if (!metric_send_interval)
+		return;
+
+	schedule_delayed_work(&m->delayed_work, hz);
+}
+
Should this also be gated on a MDS feature bit or anything? What happens
if we're running against a really old MDS that doesn't support these
stats? Does it just drop them on the floor? Should we disable sending
them in that case?
Tested without metric code in the ceph and when ceph saw a unknown type
message, it will close the socket connection directly.

Ouch, that sounds bad. How does the userland client handle this? This
seems like the kind of thing we usually add a feature bit for
somewhere...

The userland doesn't handle this case too, I will try to add a feature 
bit or something to fix it.


   extern int ceph_metric_init(struct ceph_client_metric *m);
   extern void ceph_metric_destroy(struct ceph_client_metric *m);

diff --git a/fs/ceph/super.c b/fs/ceph/super.c
index c9784eb1..66a940c 100644
--- a/fs/ceph/super.c
+++ b/fs/ceph/super.c
@@ -1282,6 +1282,35 @@ static void __exit exit_ceph(void)
   	destroy_caches();
   }
   
+static int param_set_metric_interval(const char *val, const struct kernel_param *kp)
+{
+	int ret;
+	unsigned int interval;
+
+	ret = kstrtouint(val, 0, &interval);
+	if (ret < 0) {
+		pr_err("Failed to parse metric interval '%s'\n", val);
+		return ret;
+	}
+
+	if (interval > 5 || interval < 1) {
+		pr_err("Invalid metric interval %u\n", interval);
+		return -EINVAL;
+	}
+
+	metric_send_interval = interval;
+	return 0;
+}
+
+static const struct kernel_param_ops param_ops_metric_interval = {
+	.set = param_set_metric_interval,
+	.get = param_get_uint,
+};
+
+unsigned int metric_send_interval = 1;
+module_param_cb(metric_send_interval, &param_ops_metric_interval, &metric_send_interval, 0644);
+MODULE_PARM_DESC(metric_send_interval, "Interval (in seconds) of sending perf metric to ceph cluster, valid values are 1~5 (default: 1)");
+
Aren't valid values 0-5, with 0 disabling this feature? That's
what metric_schedule_delayed() seems to indicate...
What value should it be as default ? 0 to disable it or 1 ?

I don't have strong preference here. It's probably safe enough to make
it 1. I do like the ability to turn this off though, as that gives us a
workaround in the event that it does cause trouble.

Sure, I will add it back.



Maybe in future we should let the ceph side send a notification/request
to all the clients if it wants to collect the metrics and then we could
enable it here, and defautly just disable it.

That seems like a heavier-weight solution than is called for here. This
seems like something that ought to have a feature bit. Then if that's
turned off, we can just disable this altogether.

Okay, sounds better.

Thanks


   module_init(init_ceph);
   module_exit(exit_ceph);
   
diff --git a/include/linux/ceph/ceph_fs.h b/include/linux/ceph/ceph_fs.h
index ebf5ba6..455e9b9 100644
--- a/include/linux/ceph/ceph_fs.h
+++ b/include/linux/ceph/ceph_fs.h
@@ -130,6 +130,7 @@ struct ceph_dir_layout {
   #define CEPH_MSG_CLIENT_REQUEST         24
   #define CEPH_MSG_CLIENT_REQUEST_FORWARD 25
   #define CEPH_MSG_CLIENT_REPLY           26
+#define CEPH_MSG_CLIENT_METRICS         29
   #define CEPH_MSG_CLIENT_CAPS            0x310
   #define CEPH_MSG_CLIENT_LEASE           0x311
   #define CEPH_MSG_CLIENT_SNAP            0x312