On 6/16/23 12:40, Venky Shankar wrote:
Hi Xiubo,
On Tue, Jun 6, 2023 at 6:30 AM <xiubli@xxxxxxxxxx> wrote:
From: Xiubo Li <xiubli@xxxxxxxxxx>
When the MDS rank is in clientreplay state, the metrics requests
will be discarded directly. Also, when there are a lot of known
client requests to recover from, the metrics requests will slow
down the MDS rank from getting to the active state sooner.
With this patch, we will send the metrics requests only when the
MDS rank is in active state.
Although the changes look fine. I have a question though - the metrics
are sent by the client each second (on tick()) - how many clients were
connected for the MDS to experience further slowness due to metric
update messages?
As I remembered there were 1761 clients. The MDS will just queue the
metric messages and then trigger to print one debug log message , which
could slow down the MDS, to buffer and file(?) for each when handling them.
Thanks
- Xiubo
URL: https://tracker.ceph.com/issues/61524
Reviewed-by: Milind Changire <mchangir@xxxxxxxxxx>
Signed-off-by: Xiubo Li <xiubli@xxxxxxxxxx>
---
V2:
- rephrase the commit comment from Milind's comments.
fs/ceph/metric.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/fs/ceph/metric.c b/fs/ceph/metric.c
index c47347d2e84e..cce78d769f55 100644
--- a/fs/ceph/metric.c
+++ b/fs/ceph/metric.c
@@ -36,6 +36,14 @@ static bool ceph_mdsc_send_metrics(struct ceph_mds_client *mdsc,
s32 items = 0;
s32 len;
+ /* Do not send the metrics until the MDS rank is ready */
+ mutex_lock(&mdsc->mutex);
+ if (ceph_mdsmap_get_state(mdsc->mdsmap, s->s_mds) != CEPH_MDS_STATE_ACTIVE) {
+ mutex_unlock(&mdsc->mutex);
+ return false;
+ }
+ mutex_unlock(&mdsc->mutex);
+
len = sizeof(*head) + sizeof(*cap) + sizeof(*read) + sizeof(*write)
+ sizeof(*meta) + sizeof(*dlease) + sizeof(*files)
+ sizeof(*icaps) + sizeof(*inodes) + sizeof(*rsize)
--
2.40.1