Hello Uwe, as described in my mail we are running 4.13.0-39. In conjunction with some later mails of this thread it seems that this problem might related to os/microcode (spectre) updates. I am planning a ceph/ubuntu upgrade in the next week because of various reasons, let's see what happens..... Regards Marc Am 05.09.2018 um 20:24 schrieb Uwe Sauter: > I'm also experiencing slow requests though I cannot point it to scrubbing. > > Which kernel do you run? Would you be able to test against the same kernel with Spectre/Meltdown mitigations disabled ("noibrs noibpb nopti nospectre_v2" as boot option)? > > Uwe > > Am 05.09.18 um 19:30 schrieb Brett Chancellor: >> Marc, >> As with you, this problem manifests itself only when the bluestore OSD is involved in some form of deep scrub. Anybody have any insight on what might be causing this? >> >> -Brett >> >> On Mon, Sep 3, 2018 at 4:13 AM, Marc Schöchlin <ms@xxxxxxxxxx <mailto:ms@xxxxxxxxxx>> wrote: >> >> Hi, >> >> we are also experiencing this type of behavior for some weeks on our not >> so performance critical hdd pools. >> We haven't spent so much time on this problem, because there are >> currently more important tasks - but here are a few details: >> >> Running the following loop results in the following output: >> >> while true; do ceph health|grep -q HEALTH_OK || (date; ceph health >> detail); sleep 2; done >> >> Sun Sep 2 20:59:47 CEST 2018 >> HEALTH_WARN 4 slow requests are blocked > 32 sec >> REQUEST_SLOW 4 slow requests are blocked > 32 sec >> 4 ops are blocked > 32.768 sec >> osd.43 has blocked requests > 32.768 sec >> Sun Sep 2 20:59:50 CEST 2018 >> HEALTH_WARN 4 slow requests are blocked > 32 sec >> REQUEST_SLOW 4 slow requests are blocked > 32 sec >> 4 ops are blocked > 32.768 sec >> osd.43 has blocked requests > 32.768 sec >> Sun Sep 2 20:59:52 CEST 2018 >> HEALTH_OK >> Sun Sep 2 21:00:28 CEST 2018 >> HEALTH_WARN 1 slow requests are blocked > 32 sec >> REQUEST_SLOW 1 slow requests are blocked > 32 sec >> 1 ops are blocked > 32.768 sec >> osd.41 has blocked requests > 32.768 sec >> Sun Sep 2 21:00:31 CEST 2018 >> HEALTH_WARN 7 slow requests are blocked > 32 sec >> REQUEST_SLOW 7 slow requests are blocked > 32 sec >> 7 ops are blocked > 32.768 sec >> osds 35,41 have blocked requests > 32.768 sec >> Sun Sep 2 21:00:33 CEST 2018 >> HEALTH_WARN 7 slow requests are blocked > 32 sec >> REQUEST_SLOW 7 slow requests are blocked > 32 sec >> 7 ops are blocked > 32.768 sec >> osds 35,51 have blocked requests > 32.768 sec >> Sun Sep 2 21:00:35 CEST 2018 >> HEALTH_WARN 7 slow requests are blocked > 32 sec >> REQUEST_SLOW 7 slow requests are blocked > 32 sec >> 7 ops are blocked > 32.768 sec >> osds 35,51 have blocked requests > 32.768 sec >> >> Our details: >> >> * system details: >> * Ubuntu 16.04 >> * Kernel 4.13.0-39 >> * 30 * 8 TB Disk (SEAGATE/ST8000NM0075) >> * 3* Dell Power Edge R730xd (Firmware 2.50.50.50) >> * Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz >> * 2*10GBITS SFP+ Network Adapters >> * 192GB RAM >> * Pools are using replication factor 3, 2MB object size, >> 85% write load, 1700 write IOPS/sec >> (ops mainly between 4k and 16k size), 300 read IOPS/sec >> * we have the impression that this appears on deepscrub/scrub activity. >> * Ceph 12.2.5, we alread played with the osd settings OSD Settings >> (our assumtion was that the problem is related to rocksdb compaction) >> bluestore cache kv max = 2147483648 >> bluestore cache kv ratio = 0.9 >> bluestore cache meta ratio = 0.1 >> bluestore cache size hdd = 10737418240 >> * this type problem only appears on hdd/bluestore osds, ssd/bluestore >> osds did never experienced that problem >> * the system is healthy, no swapping, no high load, no errors in dmesg >> >> I attached a log excerpt of osd.35 - probably this is useful for >> investigating the problem is someone owns deeper bluestore knowledge. >> (slow requests appeared on Sun Sep 2 21:00:35) >> >> Regards >> Marc >> >> >> Am 02.09.2018 um 15:50 schrieb Brett Chancellor: >> > The warnings look like this. > >> > 6 ops are blocked > 32.768 sec on osd.219 >> > 1 osds have slow requests >> > >> > On Sun, Sep 2, 2018, 8:45 AM Alfredo Deza <adeza@xxxxxxxxxx <mailto:adeza@xxxxxxxxxx> >> > <mailto:adeza@xxxxxxxxxx <mailto:adeza@xxxxxxxxxx>>> wrote: >> > >> > On Sat, Sep 1, 2018 at 12:45 PM, Brett Chancellor >> > <bchancellor@xxxxxxxxxxxxxx <mailto:bchancellor@xxxxxxxxxxxxxx> <mailto:bchancellor@xxxxxxxxxxxxxx >> <mailto:bchancellor@xxxxxxxxxxxxxx>>> >> > wrote: >> > > Hi Cephers, >> > > I am in the process of upgrading a cluster from Filestore to >> > bluestore, >> > > but I'm concerned about frequent warnings popping up against the new >> > > bluestore devices. I'm frequently seeing messages like this, >> > although the >> > > specific osd changes, it's always one of the few hosts I've >> > converted to >> > > bluestore. >> > > >> > > 6 ops are blocked > 32.768 sec on osd.219 >> > > 1 osds have slow requests >> > > >> > > I'm running 12.2.4, have any of you seen similar issues? It >> > seems as though >> > > these messages pop up more frequently when one of the bluestore >> > pgs is >> > > involved in a scrub. I'll include my bluestore creation process >> > below, in >> > > case that might cause an issue. (sdb, sdc, sdd are SATA, sde and >> > sdf are >> > > SSD) >> > >> > Would be useful to include what those warnings say. The ceph-volume >> > commands look OK to me >> > >> > > >> > > >> > > ## Process used to create osds >> > > sudo ceph-disk zap /dev/sdb /dev/sdc /dev/sdd /dev/sdd /dev/sde >> > /dev/sdf >> > > sudo ceph-volume lvm zap /dev/sdb >> > > sudo ceph-volume lvm zap /dev/sdc >> > > sudo ceph-volume lvm zap /dev/sdd >> > > sudo ceph-volume lvm zap /dev/sde >> > > sudo ceph-volume lvm zap /dev/sdf >> > > sudo sgdisk -n 0:2048:+133GiB -t 0:FFFF -c 1:"ceph block.db sdb" >> > /dev/sdf >> > > sudo sgdisk -n 0:0:+133GiB -t 0:FFFF -c 2:"ceph block.db sdc" >> > /dev/sdf >> > > sudo sgdisk -n 0:0:+133GiB -t 0:FFFF -c 3:"ceph block.db sdd" >> > /dev/sdf >> > > sudo sgdisk -n 0:0:+133GiB -t 0:FFFF -c 4:"ceph block.db sde" >> > /dev/sdf >> > > sudo ceph-volume lvm create --bluestore --crush-device-class hdd >> > --data >> > > /dev/sdb --block.db /dev/sdf1 >> > > sudo ceph-volume lvm create --bluestore --crush-device-class hdd >> > --data >> > > /dev/sdc --block.db /dev/sdf2 >> > > sudo ceph-volume lvm create --bluestore --crush-device-class hdd >> > --data >> > > /dev/sdd --block.db /dev/sdf3 >> > > >> > > >> > > _______________________________________________ >> > > ceph-users mailing list >> > > ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx> <mailto:ceph-users@xxxxxxxxxxxxxx >> <mailto:ceph-users@xxxxxxxxxxxxxx>> >> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com> >> > > >> > >> > >> > >> > _______________________________________________ >> > ceph-users mailing list >> > ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com> >> >> >> >> _______________________________________________ >> ceph-users mailing list >> ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com> >> >> >> >> _______________________________________________ >> ceph-users mailing list >> ceph-users@xxxxxxxxxxxxxx >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com