Re: ceph-mgr perf throttle-msgr - what is caused fails?

Eugen Block <eblock@xxxxxx> · Sun, 08 Sep 2024 11:08:13 +0000

Hi,

I don't have an answer, but it reminds me of the issue we had this  
year on a customer cluster. I had created this tracker issue [0] where  
you were the only one yet to comment. Those observations might not be  
related, but do you see any impact on the cluster?
Also, in your output "val" is still smaller than "max":

  "val": 104856554,
  "max": 104857600,

So it probably doesn't have any visible impact, does it? But the  
values are not that far apart, maybe they burst sometime, leading to  
the fail_fail counter to increase? Do you have that monitored?

Thanks,
Eugen

[0] https://tracker.ceph.com/issues/66310

Zitat von Konstantin Shalygin <k0ste@xxxxxxxx>:

Hi, seems something in mgr is throttle due val > max. I'm right?

root@mon1# ceph daemon /var/run/ceph/ceph-mgr.mon1.asok perf dump |  
jq '."throttle-msgr_dispatch_throttler-mgr-0x55930f4aed20"'
{
  "val": 104856554,
  "max": 104857600,
  "get_started": 0,
  "get": 9700833,
  "get_sum": 654452218418,
  "get_or_fail_fail": 1323887918,
  "get_or_fail_success": 9700833,
  "take": 0,
  "take_sum": 0,
  "put": 9698716,
  "put_sum": 654347361864,
  "wait": {
    "avgcount": 0,
    "sum": 0,
    "avgtime": 0
  }
}

The question is - how-to determine what exactly? Another fail_fail  
in perf counters is zero. mgr is not in container, and have  
resources to work

Thanks,
k
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx