Hi - I've been bitten by that too and checked, and that *did* happen but I swapped them off a while ago. Thanks for your quick reply :) -Alex On Mar 29, 2022, 6:26 PM -0400, Arnaud M <arnaud.meauzoone@xxxxxxxxx>, wrote: > Hello > > is swap enabled on your host ? Is swap used ? > > For our cluster we tend to allocate enough ram and disable swap > > Maybe the reboot of your host re-activated swap ? > > Try to disable swap and see if it help > > All the best > > Arnaud > > > Le mar. 29 mars 2022 à 23:41, David Orman <ormandj@xxxxxxxxxxxx> a écrit : > > > We're definitely dealing with something that sounds similar, but hard to > > > state definitively without more detail. Do you have object lock/versioned > > > buckets in use (especially if one started being used around the time of the > > > slowdown)? Was this cluster always 16.2.7? > > > > > > What is your pool configuration (EC k+m or replicated X setup), and do you > > > use the same pool for indexes and data? I'm assuming this is RGW usage via > > > the S3 API, let us know if this is not correct. > > > > > > On Tue, Mar 29, 2022 at 4:13 PM Alex Closs <acloss@xxxxxxxxxxxxx> wrote: > > > > > > > Hey folks, > > > > > > > > We have a 16.2.7 cephadm cluster that's had slow ops and several > > > > (constantly changing) laggy PGs. The set of OSDs with slow ops seems to > > > > change at random, among all 6 OSD hosts in the cluster. All drives are > > > > enterprise SATA SSDs, by either Intel or Micron. We're still not ruling out > > > > a network issue, but wanted to troubleshoot from the Ceph side in case > > > > something broke there. > > > > > > > > ceph -s: > > > > > > > > health: HEALTH_WARN > > > > 3 slow ops, oldest one blocked for 246 sec, daemons > > > > [osd.124,osd.130,osd.141,osd.152,osd.27] have slow ops. > > > > > > > > services: > > > > mon: 5 daemons, quorum > > > > ceph-osd10,ceph-mon0,ceph-mon1,ceph-osd9,ceph-osd11 (age 28h) > > > > mgr: ceph-mon0.sckxhj(active, since 25m), standbys: ceph-osd10.xmdwfh, > > > > ceph-mon1.iogajr > > > > osd: 143 osds: 143 up (since 92m), 143 in (since 2w) > > > > rgw: 3 daemons active (3 hosts, 1 zones) > > > > > > > > data: > > > > pools: 26 pools, 3936 pgs > > > > objects: 33.14M objects, 144 TiB > > > > usage: 338 TiB used, 162 TiB / 500 TiB avail > > > > pgs: 3916 active+clean > > > > 19 active+clean+laggy > > > > 1 active+clean+scrubbing+deep > > > > > > > > io: > > > > client: 59 MiB/s rd, 98 MiB/s wr, 1.66k op/s rd, 1.68k op/s wr > > > > > > > > This is actually much faster than it's been for much of the past hour, > > > > it's been as low as 50 kb/s and dozens of iops in both directions (where > > > > the cluster typically does 300MB to a few gigs, and ~4k iops) > > > > > > > > The cluster has been on 16.2.7 since a few days after release without > > > > issue. The only recent change was an apt upgrade and reboot on the hosts > > > > (which was last Friday and didn't show signs of problems). > > > > > > > > Happy to provide logs, let me know what would be useful. Thanks for > > > > reading this wall :) > > > > > > > > -Alex > > > > > > > > MIT CSAIL > > > > he/they > > > > _______________________________________________ > > > > ceph-users mailing list -- ceph-users@xxxxxxx > > > > To unsubscribe send an email to ceph-users-leave@xxxxxxx > > > > > > > _______________________________________________ > > > ceph-users mailing list -- ceph-users@xxxxxxx > > > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx