Good morning everyone. Guys, today my cluster had a "problem", it was showing SLOW_OPS, when restarting the OSDs that were showing this problem everything was solved (there were VMs stuck because of this), what I'm breaking my head is to know the reason for having SLOW_OPS. In the logs I saw that the problem started at 04:00 AM and continued until 07:50 AM (when I restarted the OSDs). I'm suspicious of some exaggerated settings that I applied and forgot there in the initial setup while performing a test, which may have caused a high use of RAM leaving a maximum of 400 MB of 32 GB free memory, which in this case was to put 512 PGs in two pools, one of which was affected. In the logs I saw that the problem started when some VMs started to perform backup actions, increasing the writing a little (to a maximum of 300 MBps), after a few seconds a disk started to show this WARN and also this line: Dec 14 04:01:01 dcs1.evocorp ceph-mon[639148]: 69 slow requests (by type [ 'delayed' : 65 'waiting for sub ops' : 4 ] most affected pool [ 'cephfs.ds_disk.data' : 69]) Then he presented these: Dec 14 04:01:02 dcs1.evocorp ceph-mon[639148]: log_channel(cluster) log [WRN] : Health check update: 0 slow ops, oldest one blocked for 36 sec, daemons [osd.20,osd.5 ] have slow ops. (SLOW_OPS) [...] Dec 14 05:52:01 dcs1.evocorp ceph-mon[639148]: log_channel(cluster) log [WRN] : Health check update: 149 slow ops, oldest one blocked for 6696 sec, daemons [osd.20,osd.5 ,osd.50] have slow ops. (SLOW_OPS) I've already checked the SMART, they're all OK, I've checked the graphs generated in Grafana and none of the disks saturate, there haven't been any incidents related to the network, that is, I haven't identified any other problem that could cause this. What could have caused this event? What can I do to prevent it from happening again? Below is some information about the cluster: 5 machines with 32GB RAM, 2 processors and 12 3TB SAS disks and connected through 40Gb interfaces. # ceph osd tree ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 163.73932 root default -3 32.74786 host dcs1 0 hdd 2.72899 osd.0 up 1.00000 1.00000 1 hdd 2.72899 osd.1 up 1.00000 1.00000 2 hdd 2.72899 osd.2 up 1.00000 1.00000 3 hdd 2.72899 osd.3 up 1.00000 1.00000 4 hdd 2.72899 osd.4 up 1.00000 1.00000 5 hdd 2.72899 osd.5 up 1.00000 1.00000 6 hdd 2.72899 osd.6 up 1.00000 1.00000 7 hdd 2.72899 osd.7 up 1.00000 1.00000 8 hdd 2.72899 osd.8 up 1.00000 1.00000 9 hdd 2.72899 osd.9 up 1.00000 1.00000 10 hdd 2.72899 osd.10 up 1.00000 1.00000 11 hdd 2.72899 osd.11 up 1.00000 1.00000 -5 32.74786 host dcs2 12 hdd 2.72899 osd.12 up 1.00000 1.00000 13 hdd 2.72899 osd.13 up 1.00000 1.00000 14 hdd 2.72899 osd.14 up 1.00000 1.00000 15 hdd 2.72899 osd.15 up 1.00000 1.00000 16 hdd 2.72899 osd.16 up 1.00000 1.00000 17 hdd 2.72899 osd.17 up 1.00000 1.00000 18 hdd 2.72899 osd.18 up 1.00000 1.00000 19 hdd 2.72899 osd.19 up 1.00000 1.00000 20 hdd 2.72899 osd.20 up 1.00000 1.00000 21 hdd 2.72899 osd.21 up 1.00000 1.00000 22 hdd 2.72899 osd.22 up 1.00000 1.00000 23 hdd 2.72899 osd.23 up 1.00000 1.00000 -7 32.74786 host dcs3 24 hdd 2.72899 osd.24 up 1.00000 1.00000 25 hdd 2.72899 osd.25 up 1.00000 1.00000 26 hdd 2.72899 osd.26 up 1.00000 1.00000 27 hdd 2.72899 osd.27 up 1.00000 1.00000 28 hdd 2.72899 osd.28 up 1.00000 1.00000 29 hdd 2.72899 osd.29 up 1.00000 1.00000 30 hdd 2.72899 osd.30 up 1.00000 1.00000 31 hdd 2.72899 osd.31 up 1.00000 1.00000 32 hdd 2.72899 osd.32 up 1.00000 1.00000 33 hdd 2.72899 osd.33 up 1.00000 1.00000 34 hdd 2.72899 osd.34 up 1.00000 1.00000 35 hdd 2.72899 osd.35 up 1.00000 1.00000 -9 32.74786 host dcs4 36 hdd 2.72899 osd.36 up 1.00000 1.00000 37 hdd 2.72899 osd.37 up 1.00000 1.00000 38 hdd 2.72899 osd.38 up 1.00000 1.00000 39 hdd 2.72899 osd.39 up 1.00000 1.00000 40 hdd 2.72899 osd.40 up 1.00000 1.00000 41 hdd 2.72899 osd.41 up 1.00000 1.00000 42 hdd 2.72899 osd.42 up 1.00000 1.00000 43 hdd 2.72899 osd.43 up 1.00000 1.00000 44 hdd 2.72899 osd.44 up 1.00000 1.00000 45 hdd 2.72899 osd.45 up 1.00000 1.00000 46 hdd 2.72899 osd.46 up 1.00000 1.00000 47 hdd 2.72899 osd.47 up 1.00000 1.00000 -11 32.74786 host dcs5 48 hdd 2.72899 osd.48 up 1.00000 1.00000 49 hdd 2.72899 osd.49 up 1.00000 1.00000 50 hdd 2.72899 osd.50 up 1.00000 1.00000 51 hdd 2.72899 osd.51 up 1.00000 1.00000 52 hdd 2.72899 osd.52 up 1.00000 1.00000 53 hdd 2.72899 osd.53 up 1.00000 1.00000 54 hdd 2.72899 osd.54 up 1.00000 1.00000 55 hdd 2.72899 osd.55 up 1.00000 1.00000 56 hdd 2.72899 osd.56 up 1.00000 1.00000 57 hdd 2.72899 osd.57 up 1.00000 1.00000 58 hdd 2.72899 osd.58 up 1.00000 1.00000 59 hdd 2.72899 osd.59 up 1.00000 1.00000 _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx