Hello is swap enabled on your host ? Is swap used ? For our cluster we tend to allocate enough ram and disable swap Maybe the reboot of your host re-activated swap ? Try to disable swap and see if it help All the best Arnaud Le mar. 29 mars 2022 à 23:41, David Orman <ormandj@xxxxxxxxxxxx> a écrit : > We're definitely dealing with something that sounds similar, but hard to > state definitively without more detail. Do you have object lock/versioned > buckets in use (especially if one started being used around the time of the > slowdown)? Was this cluster always 16.2.7? > > What is your pool configuration (EC k+m or replicated X setup), and do you > use the same pool for indexes and data? I'm assuming this is RGW usage via > the S3 API, let us know if this is not correct. > > On Tue, Mar 29, 2022 at 4:13 PM Alex Closs <acloss@xxxxxxxxxxxxx> wrote: > > > Hey folks, > > > > We have a 16.2.7 cephadm cluster that's had slow ops and several > > (constantly changing) laggy PGs. The set of OSDs with slow ops seems to > > change at random, among all 6 OSD hosts in the cluster. All drives are > > enterprise SATA SSDs, by either Intel or Micron. We're still not ruling > out > > a network issue, but wanted to troubleshoot from the Ceph side in case > > something broke there. > > > > ceph -s: > > > > health: HEALTH_WARN > > 3 slow ops, oldest one blocked for 246 sec, daemons > > [osd.124,osd.130,osd.141,osd.152,osd.27] have slow ops. > > > > services: > > mon: 5 daemons, quorum > > ceph-osd10,ceph-mon0,ceph-mon1,ceph-osd9,ceph-osd11 (age 28h) > > mgr: ceph-mon0.sckxhj(active, since 25m), standbys: ceph-osd10.xmdwfh, > > ceph-mon1.iogajr > > osd: 143 osds: 143 up (since 92m), 143 in (since 2w) > > rgw: 3 daemons active (3 hosts, 1 zones) > > > > data: > > pools: 26 pools, 3936 pgs > > objects: 33.14M objects, 144 TiB > > usage: 338 TiB used, 162 TiB / 500 TiB avail > > pgs: 3916 active+clean > > 19 active+clean+laggy > > 1 active+clean+scrubbing+deep > > > > io: > > client: 59 MiB/s rd, 98 MiB/s wr, 1.66k op/s rd, 1.68k op/s wr > > > > This is actually much faster than it's been for much of the past hour, > > it's been as low as 50 kb/s and dozens of iops in both directions (where > > the cluster typically does 300MB to a few gigs, and ~4k iops) > > > > The cluster has been on 16.2.7 since a few days after release without > > issue. The only recent change was an apt upgrade and reboot on the hosts > > (which was last Friday and didn't show signs of problems). > > > > Happy to provide logs, let me know what would be useful. Thanks for > > reading this wall :) > > > > -Alex > > > > MIT CSAIL > > he/they > > _______________________________________________ > > ceph-users mailing list -- ceph-users@xxxxxxx > > To unsubscribe send an email to ceph-users-leave@xxxxxxx > > > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx