Laggy PGs on a fairly high performance cluster

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



We have a 14 osd node all ssd cluster and for some reason we are continually getting laggy PGs and those seem to correlate to slow requests on Quincy (doesn't seem to happen on our Pacific clusters). These laggy pgs seem to shift between osds. The network seems solid, as in I'm not seeing errors or slowness. OSD hosts are heavily underutilized, normally sub 1 load and the cpus are 98% idle. I have been looking through the logs and nothing is really standing out in the OSD or ceph logs.

Some things we have tried:

  1.  Updating our cluster to 17.2.5
  2.  Manually setting our mClock profile to high_client_ops.
  3.  Increasing our total number of PGs (this something that should've happened anyways.)
  4.  Verified that jumbo frames, lacp, and throughput were functioning as intended.
  5.  Took some of our newer nodes out to see if that was an issue. Also rebooted the cluster just to be sure.

I'm curious if someone in the community has experience with this kind of issue and maybe could point to something I have overlooked.

Some example logs:

2023-01-10T22:50:23.245823+0000 mgr.openstack-mon01.b.pc.ostk.com.flbudm (mgr.120371640) 231175 : cluster [DBG] pgmap v235204: 2625 pgs: 1 active+clean+laggy, 2624 active+clean; 6.0 TiB data, 18 TiB used, 84 TiB
 / 102 TiB avail; 19 MiB/s rd, 67 MiB/s wr, 4.76k op/s
2023-01-10T22:50:23.762562+0000 osd.83 (osd.83) 906 : cluster [WRN] 6 slow requests (by type [ 'delayed' : 5 'waiting for sub ops' : 1 ] most affected pool [ 'vms' : 6 ])
2023-01-10T22:50:24.771260+0000 osd.83 (osd.83) 907 : cluster [WRN] 6 slow requests (by type [ 'delayed' : 5 'waiting for sub ops' : 1 ] most affected pool [ 'vms' : 6 ])


________________________________

CONFIDENTIALITY NOTICE: This message is intended only for the use and review of the individual or entity to which it is addressed and may contain information that is privileged and confidential. If the reader of this message is not the intended recipient, or the employee or agent responsible for delivering the message solely to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify sender immediately by telephone or return email. Thank you.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux