Re: Slow requests from bluestore osds

Marc Schöchlin <ms@xxxxxxxxxx> · Mon, 28 Jan 2019 22:27:54 +0100

Hello cephers,

as described - we also have the slow requests in our setup.

We recently updated from ceph 12.2.4 to 12.2.10, updated Ubuntu 16.04 to the latest patchlevel (with kernel 4.15.0-43) and applied dell firmware 2.8.0.

On 12.2.5 (before updating the cluster) we had in a frequency of 10min to 30minutes in the entire deepscrub-window between 8:00 PM and 6:00 AM.
Especially between 04:00AM and 06:00 AM when when we sequentially create a rbd snapshots for every rbd image and delete a outdated snapshot (we hold 3 snapshots per rbd device).

After the upgrade to 12.2.10 (and the other patches) slow requests seems to be reduced, but they still occur after the snapshot creation/deletion procedure.
Today we changed the time of the creation/deletion procedure from 4:00 AM to 7:30PM and we experienced slow request right in the the snapshot process at 8:00PM.

The slow requests only happen on a certain storage class osds (30 * 8GB spinners)  - i.e ssd osds do not have this problem on the same cluster
The pools which use this storage class are loaded by 80% write requests.

Our configuration looks like this:
---
bluestore cache kv max = 2147483648
bluestore cache kv ratio = 0.9
bluestore cache meta ratio = 0.1
bluestore cache size hdd = 10737418240
osd deep scrub interval = 2592000
osd scrub begin hour = 19
osd scrub end hour = 6
osd scrub load threshold = 4
osd scrub sleep = 0.3
osd max trimming pgs = 2
---
We do not have so much devices in this storage class (a enhancement is in progress to get more iops)

What can i do to decrease the impact of snaptrims to prevent slow requests?
(i.e. reduce "osd max trimming pgs" to "1")

Regards
Marc Schöchlin

Am 03.09.18 um 10:13 schrieb Marc Schöchlin:
> Hi,
>
> we are also experiencing this type of behavior for some weeks on our not
> so performance critical hdd pools.
> We haven't spent so much time on this problem, because there are
> currently more important tasks - but here are a few details:
>
> Running the following loop results in the following output:
>
> while true; do ceph health|grep -q HEALTH_OK || (date;  ceph health
> detail); sleep 2; done
>
> Sun Sep  2 20:59:47 CEST 2018
> HEALTH_WARN 4 slow requests are blocked > 32 sec
> REQUEST_SLOW 4 slow requests are blocked > 32 sec
>     4 ops are blocked > 32.768 sec
>     osd.43 has blocked requests > 32.768 sec
> Sun Sep  2 20:59:50 CEST 2018
> HEALTH_WARN 4 slow requests are blocked > 32 sec
> REQUEST_SLOW 4 slow requests are blocked > 32 sec
>     4 ops are blocked > 32.768 sec
>     osd.43 has blocked requests > 32.768 sec
> Sun Sep  2 20:59:52 CEST 2018
> HEALTH_OK
> Sun Sep  2 21:00:28 CEST 2018
> HEALTH_WARN 1 slow requests are blocked > 32 sec
> REQUEST_SLOW 1 slow requests are blocked > 32 sec
>     1 ops are blocked > 32.768 sec
>     osd.41 has blocked requests > 32.768 sec
> Sun Sep  2 21:00:31 CEST 2018
> HEALTH_WARN 7 slow requests are blocked > 32 sec
> REQUEST_SLOW 7 slow requests are blocked > 32 sec
>     7 ops are blocked > 32.768 sec
>     osds 35,41 have blocked requests > 32.768 sec
> Sun Sep  2 21:00:33 CEST 2018
> HEALTH_WARN 7 slow requests are blocked > 32 sec
> REQUEST_SLOW 7 slow requests are blocked > 32 sec
>     7 ops are blocked > 32.768 sec
>     osds 35,51 have blocked requests > 32.768 sec
> Sun Sep  2 21:00:35 CEST 2018
> HEALTH_WARN 7 slow requests are blocked > 32 sec
> REQUEST_SLOW 7 slow requests are blocked > 32 sec
>     7 ops are blocked > 32.768 sec
>     osds 35,51 have blocked requests > 32.768 sec
>
> Our details:
>
>   * system details:
>     * Ubuntu 16.04
>      * Kernel 4.13.0-39
>      * 30 * 8 TB Disk (SEAGATE/ST8000NM0075)
>      * 3* Dell Power Edge R730xd (Firmware 2.50.50.50)
>        * Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz
>        * 2*10GBITS SFP+ Network Adapters
>        * 192GB RAM
>      * Pools are using replication factor 3, 2MB object size,
>        85% write load, 1700 write IOPS/sec
>        (ops mainly between 4k and 16k size), 300 read IOPS/sec
>   * we have the impression that this appears on deepscrub/scrub activity.
>   * Ceph 12.2.5, we alread played with the osd settings OSD Settings
>     (our assumtion was that the problem is related to rocksdb compaction)
>     bluestore cache kv max = 2147483648
>     bluestore cache kv ratio = 0.9
>     bluestore cache meta ratio = 0.1
>     bluestore cache size hdd = 10737418240
>   * this type problem only appears on hdd/bluestore osds, ssd/bluestore
>     osds did never experienced that problem
>   * the system is healthy, no swapping, no high load, no errors in dmesg
>
> I attached a log excerpt of osd.35 - probably this is useful for
> investigating the problem is someone owns deeper bluestore knowledge.
> (slow requests appeared on Sun Sep  2 21:00:35)
>
> Regards
> Marc
>
>
> Am 02.09.2018 um 15:50 schrieb Brett Chancellor:
>> The warnings look like this. 
>>
>> 6 ops are blocked > 32.768 sec on osd.219
>> 1 osds have slow requests
>>
>> On Sun, Sep 2, 2018, 8:45 AM Alfredo Deza <adeza@xxxxxxxxxx
>> <mailto:adeza@xxxxxxxxxx>> wrote:
>>
>>     On Sat, Sep 1, 2018 at 12:45 PM, Brett Chancellor
>>     <bchancellor@xxxxxxxxxxxxxx <mailto:bchancellor@xxxxxxxxxxxxxx>>
>>     wrote:
>>     > Hi Cephers,
>>     >   I am in the process of upgrading a cluster from Filestore to
>>     bluestore,
>>     > but I'm concerned about frequent warnings popping up against the new
>>     > bluestore devices. I'm frequently seeing messages like this,
>>     although the
>>     > specific osd changes, it's always one of the few hosts I've
>>     converted to
>>     > bluestore.
>>     >
>>     > 6 ops are blocked > 32.768 sec on osd.219
>>     > 1 osds have slow requests
>>     >
>>     > I'm running 12.2.4, have any of you seen similar issues? It
>>     seems as though
>>     > these messages pop up more frequently when one of the bluestore
>>     pgs is
>>     > involved in a scrub.  I'll include my bluestore creation process
>>     below, in
>>     > case that might cause an issue. (sdb, sdc, sdd are SATA, sde and
>>     sdf are
>>     > SSD)
>>
>>     Would be useful to include what those warnings say. The ceph-volume
>>     commands look OK to me
>>
>>     >
>>     >
>>     > ## Process used to create osds
>>     > sudo ceph-disk zap /dev/sdb /dev/sdc /dev/sdd /dev/sdd /dev/sde
>>     /dev/sdf
>>     > sudo ceph-volume lvm zap /dev/sdb
>>     > sudo ceph-volume lvm zap /dev/sdc
>>     > sudo ceph-volume lvm zap /dev/sdd
>>     > sudo ceph-volume lvm zap /dev/sde
>>     > sudo ceph-volume lvm zap /dev/sdf
>>     > sudo sgdisk -n 0:2048:+133GiB -t 0:FFFF -c 1:"ceph block.db sdb"
>>     /dev/sdf
>>     > sudo sgdisk -n 0:0:+133GiB -t 0:FFFF -c 2:"ceph block.db sdc"
>>     /dev/sdf
>>     > sudo sgdisk -n 0:0:+133GiB -t 0:FFFF -c 3:"ceph block.db sdd"
>>     /dev/sdf
>>     > sudo sgdisk -n 0:0:+133GiB -t 0:FFFF -c 4:"ceph block.db sde"
>>     /dev/sdf
>>     > sudo ceph-volume lvm create --bluestore --crush-device-class hdd
>>     --data
>>     > /dev/sdb --block.db /dev/sdf1
>>     > sudo ceph-volume lvm create --bluestore --crush-device-class hdd
>>     --data
>>     > /dev/sdc --block.db /dev/sdf2
>>     > sudo ceph-volume lvm create --bluestore --crush-device-class hdd
>>     --data
>>     > /dev/sdd --block.db /dev/sdf3
>>     >
>>     >
>>     > _______________________________________________
>>     > ceph-users mailing list
>>     > ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>
>>     > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>     >
>>
>>
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com