Re: Slow requests from bluestore osds

"EDH - Manuel Rios Fernandez" <mriosfer@xxxxxxxxxxxxxxxx> · Mon, 13 May 2019 07:40:19 +0200

Hi Marc,

Try to compact OSD with slow request 

ceph tell osd.[ID] compact

This will make the OSD offline for some seconds(SSD) to minutes(HDD) and perform a compact of OMAP database.

Regards,

-----Mensaje original-----
De: ceph-users <ceph-users-bounces@xxxxxxxxxxxxxx> En nombre de Marc Schöchlin
Enviado el: lunes, 13 de mayo de 2019 6:59
Para: ceph-users@xxxxxxxxxxxxxx
Asunto: Re:  Slow requests from bluestore osds

Hello cephers,

one week ago we replaced the bluestore cache size by "osd memory target" and removed the detail memory settings.
This storage class now runs 42*8GB spinners with a permanent write workload of 2000-3000 write IOPS, and 1200-8000 read IOPS.

Out new setup is now:
(12.2.10 on Ubuntu 16.04)

[osd]
osd deep scrub interval = 2592000
osd scrub begin hour = 19
osd scrub end hour = 6
osd scrub load threshold = 6
osd scrub sleep = 0.3
osd snap trim sleep = 0.4
pg max concurrent snap trims = 1

[osd.51]
osd memory target = 8589934592
...

After that (restarting the entire cluster with these settings) we were very happy to not seeany slow request for 7 days.

Unfortunately this night the slow requests returned on one osd without any known change of the workload of the last 14 days (according to our detailed monitoring)

2019-05-12 22:00:00.000117 mon.ceph-mon-s43 [INF] overall HEALTH_OK
2019-05-12 23:00:00.000130 mon.ceph-mon-s43 [INF] overall HEALTH_OK
2019-05-13 00:00:00.000129 mon.ceph-mon-s43 [INF] overall HEALTH_OK
2019-05-13 00:00:44.069793 mon.ceph-mon-s43 [WRN] Health check failed: 416 slow requests are blocked > 32 sec. Implicated osds 51 (REQUEST_SLOW)
2019-05-13 00:00:50.151190 mon.ceph-mon-s43 [WRN] Health check update: 439 slow requests are blocked > 32 sec. Implicated osds 51 (REQUEST_SLOW)
2019-05-13 00:00:59.750398 mon.ceph-mon-s43 [WRN] Health check update: 452 slow requests are blocked > 32 sec. Implicated osds 51 (REQUEST_SLOW)
2019-05-13 00:01:04.750697 mon.ceph-mon-s43 [WRN] Health check update: 283 slow requests are blocked > 32 sec. Implicated osds 51 (REQUEST_SLOW)
2019-05-13 00:01:10.419801 mon.ceph-mon-s43 [WRN] Health check update: 230 slow requests are blocked > 32 sec. Implicated osds 51 (REQUEST_SLOW)
2019-05-13 00:01:19.751516 mon.ceph-mon-s43 [WRN] Health check update: 362 slow requests are blocked > 32 sec. Implicated osds 51 (REQUEST_SLOW)
2019-05-13 00:01:24.751822 mon.ceph-mon-s43 [WRN] Health check update: 324 slow requests are blocked > 32 sec. Implicated osds 51 (REQUEST_SLOW)
2019-05-13 00:01:30.675160 mon.ceph-mon-s43 [WRN] Health check update: 341 slow requests are blocked > 32 sec. Implicated osds 51 (REQUEST_SLOW)
2019-05-13 00:01:38.759012 mon.ceph-mon-s43 [WRN] Health check update: 390 slow requests are blocked > 32 sec. Implicated osds 51 (REQUEST_SLOW)
2019-05-13 00:01:44.858392 mon.ceph-mon-s43 [WRN] Health check update: 366 slow requests are blocked > 32 sec. Implicated osds 51 (REQUEST_SLOW)
2019-05-13 00:01:54.753388 mon.ceph-mon-s43 [WRN] Health check update: 352 slow requests are blocked > 32 sec. Implicated osds 51 (REQUEST_SLOW)
2019-05-13 00:01:59.045220 mon.ceph-mon-s43 [INF] Health check cleared: REQUEST_SLOW (was: 168 slow requests are blocked > 32 sec. Implicated osds 51)
2019-05-13 00:01:59.045257 mon.ceph-mon-s43 [INF] Cluster is now healthy
2019-05-13 01:00:00.000114 mon.ceph-mon-s43 [INF] overall HEALTH_OK
2019-05-13 02:00:00.000130 mon.ceph-mon-s43 [INF] overall HEALTH_OK

The output of a "ceph health detail" loop at the time the problem occurred:

Mon May 13 00:01:27 CEST 2019
HEALTH_WARN 324 slow requests are blocked > 32 sec. Implicated osds 51 REQUEST_SLOW 324 slow requests are blocked > 32 sec. Implicated osds 51
    324 ops are blocked > 32.768 sec
    osd.51 has blocked requests > 32.768 sec

The logfile of the OSD:

2019-05-12 23:57:28.767463 7f38da4e2700  4 rocksdb: (Original Log Time 2019/05/12-23:57:28.767419) [/build/ceph-12.2.10/src/rocksdb/db/db_impl_compaction_flush.cc:132] [default] Level summary: base level 1 max b ytes base 268435456 files[2 4 21 122 0 0 0] max score 0.94

2019-05-12 23:57:28.767511 7f38da4e2700  4 rocksdb: [/build/ceph-12.2.10/src/rocksdb/db/db_impl_files.cc:388] [JOB 2991] Try to delete WAL files size 256700142, prev total WAL file size 257271487, number of live
 WAL files 2.

2019-05-12 23:58:07.816376 7f38ddce9700  0 log_channel(cluster) log [DBG] : 34.ac scrub ok
2019-05-12 23:59:54.070025 7f38de4ea700  0 log_channel(cluster) log [DBG] : 34.236 scrub starts
2019-05-13 00:02:21.818689 7f38de4ea700  0 log_channel(cluster) log [DBG] : 34.236 scrub ok
2019-05-13 00:04:37.613094 7f38ead03700  4 rocksdb: [/build/ceph-12.2.10/src/rocksdb/db/db_impl_write.cc:684] reusing log 422507 from recycle list

2019-05-13 00:04:37.613186 7f38ead03700  4 rocksdb: [/build/ceph-12.2.10/src/rocksdb/db/db_impl_write.cc:725] [default] New memtable created with log file: #422511. Immutable memtables: 0.

Any hints how to find more details about the origin of this problem?
How can we solve that?

Regards
Marc

Am 28.01.19 um 22:27 schrieb Marc Schöchlin:
> Hello cephers,
>
> as described - we also have the slow requests in our setup.
>
> We recently updated from ceph 12.2.4 to 12.2.10, updated Ubuntu 16.04 to the latest patchlevel (with kernel 4.15.0-43) and applied dell firmware 2.8.0.
>
> On 12.2.5 (before updating the cluster) we had in a frequency of 10min to 30minutes in the entire deepscrub-window between 8:00 PM and 6:00 AM.
> Especially between 04:00AM and 06:00 AM when when we sequentially create a rbd snapshots for every rbd image and delete a outdated snapshot (we hold 3 snapshots per rbd device).
>
> After the upgrade to 12.2.10 (and the other patches) slow requests seems to be reduced, but they still occur after the snapshot creation/deletion procedure.
> Today we changed the time of the creation/deletion procedure from 4:00 AM to 7:30PM and we experienced slow request right in the the snapshot process at 8:00PM.
>
> The slow requests only happen on a certain storage class osds (30 * 
> 8GB spinners)  - i.e ssd osds do not have this problem on the same cluster The pools which use this storage class are loaded by 80% write requests.
>
> Our configuration looks like this:
> ---
> bluestore cache kv max = 2147483648
> bluestore cache kv ratio = 0.9
> bluestore cache meta ratio = 0.1
> bluestore cache size hdd = 10737418240 osd deep scrub interval = 
> 2592000 osd scrub begin hour = 19 osd scrub end hour = 6 osd scrub 
> load threshold = 4 osd scrub sleep = 0.3 osd max trimming pgs = 2
> ---
> We do not have so much devices in this storage class (a enhancement is 
> in progress to get more iops)
>
> What can i do to decrease the impact of snaptrims to prevent slow requests?
> (i.e. reduce "osd max trimming pgs" to "1")
>
> Regards
> Marc Schöchlin
>
> Am 03.09.18 um 10:13 schrieb Marc Schöchlin:
>> Hi,
>>
>> we are also experiencing this type of behavior for some weeks on our 
>> not so performance critical hdd pools.
>> We haven't spent so much time on this problem, because there are 
>> currently more important tasks - but here are a few details:
>>
>> Running the following loop results in the following output:
>>
>> while true; do ceph health|grep -q HEALTH_OK || (date;  ceph health 
>> detail); sleep 2; done
>>
>> Sun Sep  2 20:59:47 CEST 2018
>> HEALTH_WARN 4 slow requests are blocked > 32 sec REQUEST_SLOW 4 slow 
>> requests are blocked > 32 sec
>>     4 ops are blocked > 32.768 sec
>>     osd.43 has blocked requests > 32.768 sec Sun Sep  2 20:59:50 CEST 
>> 2018 HEALTH_WARN 4 slow requests are blocked > 32 sec REQUEST_SLOW 4 
>> slow requests are blocked > 32 sec
>>     4 ops are blocked > 32.768 sec
>>     osd.43 has blocked requests > 32.768 sec Sun Sep  2 20:59:52 CEST 
>> 2018 HEALTH_OK Sun Sep  2 21:00:28 CEST 2018 HEALTH_WARN 1 slow 
>> requests are blocked > 32 sec REQUEST_SLOW 1 slow requests are 
>> blocked > 32 sec
>>     1 ops are blocked > 32.768 sec
>>     osd.41 has blocked requests > 32.768 sec Sun Sep  2 21:00:31 CEST 
>> 2018 HEALTH_WARN 7 slow requests are blocked > 32 sec REQUEST_SLOW 7 
>> slow requests are blocked > 32 sec
>>     7 ops are blocked > 32.768 sec
>>     osds 35,41 have blocked requests > 32.768 sec Sun Sep  2 21:00:33 
>> CEST 2018 HEALTH_WARN 7 slow requests are blocked > 32 sec 
>> REQUEST_SLOW 7 slow requests are blocked > 32 sec
>>     7 ops are blocked > 32.768 sec
>>     osds 35,51 have blocked requests > 32.768 sec Sun Sep  2 21:00:35 
>> CEST 2018 HEALTH_WARN 7 slow requests are blocked > 32 sec 
>> REQUEST_SLOW 7 slow requests are blocked > 32 sec
>>     7 ops are blocked > 32.768 sec
>>     osds 35,51 have blocked requests > 32.768 sec
>>
>> Our details:
>>
>>   * system details:
>>     * Ubuntu 16.04
>>      * Kernel 4.13.0-39
>>      * 30 * 8 TB Disk (SEAGATE/ST8000NM0075)
>>      * 3* Dell Power Edge R730xd (Firmware 2.50.50.50)
>>        * Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz
>>        * 2*10GBITS SFP+ Network Adapters
>>        * 192GB RAM
>>      * Pools are using replication factor 3, 2MB object size,
>>        85% write load, 1700 write IOPS/sec
>>        (ops mainly between 4k and 16k size), 300 read IOPS/sec
>>   * we have the impression that this appears on deepscrub/scrub activity.
>>   * Ceph 12.2.5, we alread played with the osd settings OSD Settings
>>     (our assumtion was that the problem is related to rocksdb compaction)
>>     bluestore cache kv max = 2147483648
>>     bluestore cache kv ratio = 0.9
>>     bluestore cache meta ratio = 0.1
>>     bluestore cache size hdd = 10737418240
>>   * this type problem only appears on hdd/bluestore osds, ssd/bluestore
>>     osds did never experienced that problem
>>   * the system is healthy, no swapping, no high load, no errors in 
>> dmesg
>>
>> I attached a log excerpt of osd.35 - probably this is useful for 
>> investigating the problem is someone owns deeper bluestore knowledge.
>> (slow requests appeared on Sun Sep  2 21:00:35)
>>
>> Regards
>> Marc
>>
>>
>> Am 02.09.2018 um 15:50 schrieb Brett Chancellor:
>>> The warnings look like this.
>>>
>>> 6 ops are blocked > 32.768 sec on osd.219
>>> 1 osds have slow requests
>>>
>>> On Sun, Sep 2, 2018, 8:45 AM Alfredo Deza <adeza@xxxxxxxxxx 
>>> <mailto:adeza@xxxxxxxxxx>> wrote:
>>>
>>>     On Sat, Sep 1, 2018 at 12:45 PM, Brett Chancellor
>>>     <bchancellor@xxxxxxxxxxxxxx <mailto:bchancellor@xxxxxxxxxxxxxx>>
>>>     wrote:
>>>     > Hi Cephers,
>>>     >   I am in the process of upgrading a cluster from Filestore to
>>>     bluestore,
>>>     > but I'm concerned about frequent warnings popping up against the new
>>>     > bluestore devices. I'm frequently seeing messages like this,
>>>     although the
>>>     > specific osd changes, it's always one of the few hosts I've
>>>     converted to
>>>     > bluestore.
>>>     >
>>>     > 6 ops are blocked > 32.768 sec on osd.219
>>>     > 1 osds have slow requests
>>>     >
>>>     > I'm running 12.2.4, have any of you seen similar issues? It
>>>     seems as though
>>>     > these messages pop up more frequently when one of the bluestore
>>>     pgs is
>>>     > involved in a scrub.  I'll include my bluestore creation process
>>>     below, in
>>>     > case that might cause an issue. (sdb, sdc, sdd are SATA, sde and
>>>     sdf are
>>>     > SSD)
>>>
>>>     Would be useful to include what those warnings say. The ceph-volume
>>>     commands look OK to me
>>>
>>>     >
>>>     >
>>>     > ## Process used to create osds
>>>     > sudo ceph-disk zap /dev/sdb /dev/sdc /dev/sdd /dev/sdd /dev/sde
>>>     /dev/sdf
>>>     > sudo ceph-volume lvm zap /dev/sdb
>>>     > sudo ceph-volume lvm zap /dev/sdc
>>>     > sudo ceph-volume lvm zap /dev/sdd
>>>     > sudo ceph-volume lvm zap /dev/sde
>>>     > sudo ceph-volume lvm zap /dev/sdf
>>>     > sudo sgdisk -n 0:2048:+133GiB -t 0:FFFF -c 1:"ceph block.db sdb"
>>>     /dev/sdf
>>>     > sudo sgdisk -n 0:0:+133GiB -t 0:FFFF -c 2:"ceph block.db sdc"
>>>     /dev/sdf
>>>     > sudo sgdisk -n 0:0:+133GiB -t 0:FFFF -c 3:"ceph block.db sdd"
>>>     /dev/sdf
>>>     > sudo sgdisk -n 0:0:+133GiB -t 0:FFFF -c 4:"ceph block.db sde"
>>>     /dev/sdf
>>>     > sudo ceph-volume lvm create --bluestore --crush-device-class hdd
>>>     --data
>>>     > /dev/sdb --block.db /dev/sdf1
>>>     > sudo ceph-volume lvm create --bluestore --crush-device-class hdd
>>>     --data
>>>     > /dev/sdc --block.db /dev/sdf2
>>>     > sudo ceph-volume lvm create --bluestore --crush-device-class hdd
>>>     --data
>>>     > /dev/sdd --block.db /dev/sdf3
>>>     >
>>>     >
>>>     > _______________________________________________
>>>     > ceph-users mailing list
>>>     > ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>
>>>     > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>     >
>>>
>>>
>>>
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@xxxxxxxxxxxxxx
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com