Hi,
I don't have an answer, but it reminds me of the issue we had this
year on a customer cluster. I had created this tracker issue [0] where
you were the only one yet to comment. Those observations might not be
related, but do you see any impact on the cluster?
Also, in your output "val" is still smaller than "max":
"val": 104856554,
"max": 104857600,
So it probably doesn't have any visible impact, does it? But the
values are not that far apart, maybe they burst sometime, leading to
the fail_fail counter to increase? Do you have that monitored?
Thanks,
Eugen
[0] https://tracker.ceph.com/issues/66310
Zitat von Konstantin Shalygin <k0ste@xxxxxxxx>:
Hi, seems something in mgr is throttle due val > max. I'm right?
root@mon1# ceph daemon /var/run/ceph/ceph-mgr.mon1.asok perf dump |
jq '."throttle-msgr_dispatch_throttler-mgr-0x55930f4aed20"'
{
"val": 104856554,
"max": 104857600,
"get_started": 0,
"get": 9700833,
"get_sum": 654452218418,
"get_or_fail_fail": 1323887918,
"get_or_fail_success": 9700833,
"take": 0,
"take_sum": 0,
"put": 9698716,
"put_sum": 654347361864,
"wait": {
"avgcount": 0,
"sum": 0,
"avgtime": 0
}
}
The question is - how-to determine what exactly? Another fail_fail
in perf counters is zero. mgr is not in container, and have
resources to work
Thanks,
k
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx