Re: Slow requests from bluestore osds / crashing rbd-nbd

Charles Alva <charlesalva@xxxxxxxxx> · Tue, 21 May 2019 17:03:10 +0700

Hi Jason,

Should we disable fstrim services inside VM which runs on top of RBD?

I recall Ubuntu OS has weekly fstrim cronjob enabled by default, while we have to enable fstrim service manually on Debian and CentOS.

Kind regards,

Charles Alva
Sent from Gmail Mobile

On Tue, May 21, 2019, 4:49 AM Jason Dillaman <jdillama@xxxxxxxxxx> wrote:
On Mon, May 20, 2019 at 2:17 PM Marc Schöchlin <ms@xxxxxxxxxx> wrote:

>

> Hello cephers,

>

> we have a few systems which utilize a rbd-bd map/mount to get access to a rbd volume.

> (This problem seems to be related to " Slow requests from bluestore osds" (the original thread))

>

> Unfortunately the rbd-nbd device of a system crashes three mondays in series at ~00:00 when the systemd fstrim timer executes "fstrim -av".

> (which runs in parallel to deep scrub operations)

That's probably not a good practice if you have lots of VMs doing this

at the same time *and* you are not using object-map. The reason is

that "fstrim" could discard huge extents that result around a thousand

concurrent remove/truncate/zero ops per image being thrown at your

cluster.

> After that the device constantly reports io errors every time a access to the filesystem happens.

> Unmounting, remapping and mounting helped to get the filesystem/device back into business :-)

If the cluster was being DDoSed by the fstrims, the VM OSes' might

have timed out thinking a controller failure.

> Manual 30 minute stresstests using the following fio command, did not produce any problems on client side

> (Ceph storage reported some slow requests while testing).

>

> fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=test --bs=4k --iodepth=64 --size=4G --readwrite=randrw --rwmixread=50 --numjobs=50 --loops=10

>

> It seems that others also experienced this problem: https://ceph-users.ceph.narkive.com/2FIfyx1U/rbd-nbd-timeout-and-crash

> The change for setting device timeouts by not seems to be merged to luminous.

> Experiments setting the timeout manually after mapping using https://github.com/OnApp/nbd-kernel_mod/blob/master/nbd_set_timeout.c haven't change the situation.

>

> Do you have suggestions how to analyze/solve the situation?

>

> Regards

> Marc

> ________________________________

>

>

>

> The client kernel throws messages like this:

>

> May 19 23:59:01 int-nfs-001 CRON[836295]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 60 2)

> May 20 00:00:30 int-nfs-001 systemd[1]: Starting Discard unused blocks...

> May 20 00:01:02 int-nfs-001 kernel: [1077851.623582] block nbd0: Connection timed out

> May 20 00:01:02 int-nfs-001 kernel: [1077851.623613] block nbd0: shutting down sockets

> May 20 00:01:02 int-nfs-001 kernel: [1077851.623617] print_req_error: I/O error, dev nbd0, sector 84082280

> May 20 00:01:02 int-nfs-001 kernel: [1077851.623632] block nbd0: Connection timed out

> May 20 00:01:02 int-nfs-001 kernel: [1077851.623636] print_req_error: I/O error, dev nbd0, sector 92470887

> May 20 00:01:02 int-nfs-001 kernel: [1077851.623642] block nbd0: Connection timed out

>

> Ceph throws messages like this:

>

> 2019-05-20 00:00:00.000124 mon.ceph-mon-s43 mon.0 10.23.27.153:6789/0 173572 : cluster [INF] overall HEALTH_OK

> 2019-05-20 00:00:54.249998 mon.ceph-mon-s43 mon.0 10.23.27.153:6789/0 173586 : cluster [WRN] Health check failed: 644 slow requests are blocked > 32 sec. Implicated osds 51 (REQUEST_SLOW)

> 2019-05-20 00:01:00.330566 mon.ceph-mon-s43 mon.0 10.23.27.153:6789/0 173587 : cluster [WRN] Health check update: 594 slow requests are blocked > 32 sec. Implicated osds 51 (REQUEST_SLOW)

> 2019-05-20 00:01:09.768476 mon.ceph-mon-s43 mon.0 10.23.27.153:6789/0 173591 : cluster [WRN] Health check update: 505 slow requests are blocked > 32 sec. Implicated osds 51 (REQUEST_SLOW)

> 2019-05-20 00:01:14.768769 mon.ceph-mon-s43 mon.0 10.23.27.153:6789/0 173592 : cluster [WRN] Health check update: 497 slow requests are blocked > 32 sec. Implicated osds 51 (REQUEST_SLOW)

> 2019-05-20 00:01:20.610398 mon.ceph-mon-s43 mon.0 10.23.27.153:6789/0 173593 : cluster [WRN] Health check update: 509 slow requests are blocked > 32 sec. Implicated osds 51 (REQUEST_SLOW)

> 2019-05-20 00:01:28.721891 mon.ceph-mon-s43 mon.0 10.23.27.153:6789/0 173594 : cluster [WRN] Health check update: 501 slow requests are blocked > 32 sec. Implicated osds 51 (REQUEST_SLOW)

> 2019-05-20 00:01:34.909842 mon.ceph-mon-s43 mon.0 10.23.27.153:6789/0 173596 : cluster [WRN] Health check update: 494 slow requests are blocked > 32 sec. Implicated osds 51 (REQUEST_SLOW)

> 2019-05-20 00:01:44.770330 mon.ceph-mon-s43 mon.0 10.23.27.153:6789/0 173597 : cluster [WRN] Health check update: 500 slow requests are blocked > 32 sec. Implicated osds 51 (REQUEST_SLOW)

> 2019-05-20 00:01:49.770625 mon.ceph-mon-s43 mon.0 10.23.27.153:6789/0 173599 : cluster [WRN] Health check update: 608 slow requests are blocked > 32 sec. Implicated osds 51 (REQUEST_SLOW)

> 2019-05-20 00:01:55.073734 mon.ceph-mon-s43 mon.0 10.23.27.153:6789/0 173600 : cluster [WRN] Health check update: 593 slow requests are blocked > 32 sec. Implicated osds 51 (REQUEST_SLOW)

> 2019-05-20 00:02:04.771432 mon.ceph-mon-s43 mon.0 10.23.27.153:6789/0 173607 : cluster [WRN] Health check update: 552 slow requests are blocked > 32 sec. Implicated osds 51 (REQUEST_SLOW)

> 2019-05-20 00:02:09.771730 mon.ceph-mon-s43 mon.0 10.23.27.153:6789/0 173609 : cluster [WRN] Health check update: 720 slow requests are blocked > 32 sec. Implicated osds 51 (REQUEST_SLOW)

> 2019-05-20 00:02:19.393803 mon.ceph-mon-s43 mon.0 10.23.27.153:6789/0 173610 : cluster [WRN] Health check update: 539 slow requests are blocked > 32 sec. Implicated osds 51 (REQUEST_SLOW)

> 2019-05-20 00:02:25.474605 mon.ceph-mon-s43 mon.0 10.23.27.153:6789/0 173611 : cluster [WRN] Health check update: 527 slow requests are blocked > 32 sec. Implicated osds 51 (REQUEST_SLOW)

> 2019-05-20 00:02:34.773039 mon.ceph-mon-s43 mon.0 10.23.27.153:6789/0 173612 : cluster [WRN] Health check update: 496 slow requests are blocked > 32 sec. Implicated osds 51 (REQUEST_SLOW)

> 2019-05-20 00:02:39.773312 mon.ceph-mon-s43 mon.0 10.23.27.153:6789/0 173613 : cluster [WRN] Health check update: 493 slow requests are blocked > 32 sec. Implicated osds 51 (REQUEST_SLOW)

> 2019-05-20 00:02:44.773604 mon.ceph-mon-s43 mon.0 10.23.27.153:6789/0 173614 : cluster [WRN] Health check update: 528 slow requests are blocked > 32 sec. Implicated osds 51 (REQUEST_SLOW)

> 2019-05-20 00:02:49.801997 mon.ceph-mon-s43 mon.0 10.23.27.153:6789/0 173616 : cluster [WRN] Health check update: 537 slow requests are blocked > 32 sec. Implicated osds 51 (REQUEST_SLOW)

> 2019-05-20 00:02:59.779779 mon.ceph-mon-s43 mon.0 10.23.27.153:6789/0 173617 : cluster [WRN] Health check update: 520 slow requests are blocked > 32 sec. Implicated osds 51 (REQUEST_SLOW)

> 2019-05-20 00:03:04.780074 mon.ceph-mon-s43 mon.0 10.23.27.153:6789/0 173622 : cluster [WRN] Health check update: 493 slow requests are blocked > 32 sec. Implicated osds 51 (REQUEST_SLOW)

> 2019-05-20 00:03:10.073854 mon.ceph-mon-s43 mon.0 10.23.27.153:6789/0 173624 : cluster [WRN] Health check update: 452 slow requests are blocked > 32 sec. Implicated osds 51 (REQUEST_SLOW)

> 2019-05-20 00:03:19.780877 mon.ceph-mon-s43 mon.0 10.23.27.153:6789/0 173625 : cluster [WRN] Health check update: 515 slow requests are blocked > 32 sec. Implicated osds 51 (REQUEST_SLOW)

> 2019-05-20 00:03:24.781177 mon.ceph-mon-s43 mon.0 10.23.27.153:6789/0 173626 : cluster [WRN] Health check update: 540 slow requests are blocked > 32 sec. Implicated osds 51 (REQUEST_SLOW)

> 2019-05-20 00:03:30.321540 mon.ceph-mon-s43 mon.0 10.23.27.153:6789/0 173627 : cluster [WRN] Health check update: 545 slow requests are blocked > 32 sec. Implicated osds 51 (REQUEST_SLOW)

> 2019-05-20 00:03:39.781968 mon.ceph-mon-s43 mon.0 10.23.27.153:6789/0 173628 : cluster [WRN] Health check update: 508 slow requests are blocked > 32 sec. Implicated osds 51 (REQUEST_SLOW)

> 2019-05-20 00:03:44.782261 mon.ceph-mon-s43 mon.0 10.23.27.153:6789/0 173629 : cluster [WRN] Health check update: 469 slow requests are blocked > 32 sec. Implicated osds 51 (REQUEST_SLOW)

> 2019-05-20 00:03:50.610639 mon.ceph-mon-s43 mon.0 10.23.27.153:6789/0 173630 : cluster [WRN] Health check update: 513 slow requests are blocked > 32 sec. Implicated osds 51 (REQUEST_SLOW)

> 2019-05-20 00:03:58.724045 mon.ceph-mon-s43 mon.0 10.23.27.153:6789/0 173631 : cluster [WRN] Health check update: 350 slow requests are blocked > 32 sec. Implicated osds 51 (REQUEST_SLOW)

> 2019-05-20 00:04:04.801989 mon.ceph-mon-s43 mon.0 10.23.27.153:6789/0 173638 : cluster [WRN] Health check update: 356 slow requests are blocked > 32 sec. Implicated osds 51 (REQUEST_SLOW)

> 2019-05-20 00:04:14.783787 mon.ceph-mon-s43 mon.0 10.23.27.153:6789/0 173640 : cluster [WRN] Health check update: 395 slow requests are blocked > 32 sec. Implicated osds 51 (REQUEST_SLOW)

> 2019-05-20 00:04:19.234877 mon.ceph-mon-s43 mon.0 10.23.27.153:6789/0 173641 : cluster [INF] Health check cleared: REQUEST_SLOW (was: 238 slow requests are blocked > 32 sec. Implicated osds 51)

> 2019-05-20 00:04:19.234921 mon.ceph-mon-s43 mon.0 10.23.27.153:6789/0 173642 : cluster [INF] Cluster is now healthy

> 2019-05-20 01:00:00.000124 mon.ceph-mon-s43 mon.0 10.23.27.153:6789/0 174035 : cluster [INF] overall HEALTH_OK

>

> The parameters of our environment:

>

> Storage System (OSDs and MONs)

>

> Ceph 12.2.11

> Ubuntu 16.04/1804

> 30 * 8GB spinners distributed over

>

> Client

>

> Ceph 12.2.11

> Ubuntu 18.04 / 64 Bit

>

> ceph.conf:

> [global]

> fsid = <redacted>

> mon host = <redacted>

> public network = <redacted>

>

> [client]

> rbd cache = true

> rbd cache size = 536870912

> rbd cache max dirty = 268435456

> rbd cache target dirty = 134217728

> rbd cache max dirty age = 30

> rbd readahead max bytes = 4194304

>

>

> Regards

> Marc

>

> Am 13.05.19 um 07:40 schrieb EDH - Manuel Rios Fernandez:

>

> Hi Marc,

>

> Try to compact OSD with slow request

>

> ceph tell osd.[ID] compact

>

> This will make the OSD offline for some seconds(SSD) to minutes(HDD) and perform a compact of OMAP database.

>

> Regards,

>

>

>

>

> -----Mensaje original-----

> De: ceph-users <ceph-users-bounces@xxxxxxxxxxxxxx> En nombre de Marc Schöchlin

> Enviado el: lunes, 13 de mayo de 2019 6:59

> Para: ceph-users@xxxxxxxxxxxxxx

> Asunto: Re:  Slow requests from bluestore osds

>

> Hello cephers,

>

> one week ago we replaced the bluestore cache size by "osd memory target" and removed the detail memory settings.

> This storage class now runs 42*8GB spinners with a permanent write workload of 2000-3000 write IOPS, and 1200-8000 read IOPS.

>

> Out new setup is now:

> (12.2.10 on Ubuntu 16.04)

>

> [osd]

> osd deep scrub interval = 2592000

> osd scrub begin hour = 19

> osd scrub end hour = 6

> osd scrub load threshold = 6

> osd scrub sleep = 0.3

> osd snap trim sleep = 0.4

> pg max concurrent snap trims = 1

>

> [osd.51]

> osd memory target = 8589934592

> ...

>

> After that (restarting the entire cluster with these settings) we were very happy to not seeany slow request for 7 days.

>

> Unfortunately this night the slow requests returned on one osd without any known change of the workload of the last 14 days (according to our detailed monitoring)

>

> 2019-05-12 22:00:00.000117 mon.ceph-mon-s43 [INF] overall HEALTH_OK

> 2019-05-12 23:00:00.000130 mon.ceph-mon-s43 [INF] overall HEALTH_OK

> 2019-05-13 00:00:00.000129 mon.ceph-mon-s43 [INF] overall HEALTH_OK

> 2019-05-13 00:00:44.069793 mon.ceph-mon-s43 [WRN] Health check failed: 416 slow requests are blocked > 32 sec. Implicated osds 51 (REQUEST_SLOW)

> 2019-05-13 00:00:50.151190 mon.ceph-mon-s43 [WRN] Health check update: 439 slow requests are blocked > 32 sec. Implicated osds 51 (REQUEST_SLOW)

> 2019-05-13 00:00:59.750398 mon.ceph-mon-s43 [WRN] Health check update: 452 slow requests are blocked > 32 sec. Implicated osds 51 (REQUEST_SLOW)

> 2019-05-13 00:01:04.750697 mon.ceph-mon-s43 [WRN] Health check update: 283 slow requests are blocked > 32 sec. Implicated osds 51 (REQUEST_SLOW)

> 2019-05-13 00:01:10.419801 mon.ceph-mon-s43 [WRN] Health check update: 230 slow requests are blocked > 32 sec. Implicated osds 51 (REQUEST_SLOW)

> 2019-05-13 00:01:19.751516 mon.ceph-mon-s43 [WRN] Health check update: 362 slow requests are blocked > 32 sec. Implicated osds 51 (REQUEST_SLOW)

> 2019-05-13 00:01:24.751822 mon.ceph-mon-s43 [WRN] Health check update: 324 slow requests are blocked > 32 sec. Implicated osds 51 (REQUEST_SLOW)

> 2019-05-13 00:01:30.675160 mon.ceph-mon-s43 [WRN] Health check update: 341 slow requests are blocked > 32 sec. Implicated osds 51 (REQUEST_SLOW)

> 2019-05-13 00:01:38.759012 mon.ceph-mon-s43 [WRN] Health check update: 390 slow requests are blocked > 32 sec. Implicated osds 51 (REQUEST_SLOW)

> 2019-05-13 00:01:44.858392 mon.ceph-mon-s43 [WRN] Health check update: 366 slow requests are blocked > 32 sec. Implicated osds 51 (REQUEST_SLOW)

> 2019-05-13 00:01:54.753388 mon.ceph-mon-s43 [WRN] Health check update: 352 slow requests are blocked > 32 sec. Implicated osds 51 (REQUEST_SLOW)

> 2019-05-13 00:01:59.045220 mon.ceph-mon-s43 [INF] Health check cleared: REQUEST_SLOW (was: 168 slow requests are blocked > 32 sec. Implicated osds 51)

> 2019-05-13 00:01:59.045257 mon.ceph-mon-s43 [INF] Cluster is now healthy

> 2019-05-13 01:00:00.000114 mon.ceph-mon-s43 [INF] overall HEALTH_OK

> 2019-05-13 02:00:00.000130 mon.ceph-mon-s43 [INF] overall HEALTH_OK

>

>

> The output of a "ceph health detail" loop at the time the problem occurred:

>

> Mon May 13 00:01:27 CEST 2019

> HEALTH_WARN 324 slow requests are blocked > 32 sec. Implicated osds 51 REQUEST_SLOW 324 slow requests are blocked > 32 sec. Implicated osds 51

>     324 ops are blocked > 32.768 sec

>     osd.51 has blocked requests > 32.768 sec

>

> The logfile of the OSD:

>

> 2019-05-12 23:57:28.767463 7f38da4e2700  4 rocksdb: (Original Log Time 2019/05/12-23:57:28.767419) [/build/ceph-12.2.10/src/rocksdb/db/db_impl_compaction_flush.cc:132] [default] Level summary: base level 1 max b ytes base 268435456 files[2 4 21 122 0 0 0] max score 0.94

>

> 2019-05-12 23:57:28.767511 7f38da4e2700  4 rocksdb: [/build/ceph-12.2.10/src/rocksdb/db/db_impl_files.cc:388] [JOB 2991] Try to delete WAL files size 256700142, prev total WAL file size 257271487, number of live

>  WAL files 2.

>

> 2019-05-12 23:58:07.816376 7f38ddce9700  0 log_channel(cluster) log [DBG] : 34.ac scrub ok

> 2019-05-12 23:59:54.070025 7f38de4ea700  0 log_channel(cluster) log [DBG] : 34.236 scrub starts

> 2019-05-13 00:02:21.818689 7f38de4ea700  0 log_channel(cluster) log [DBG] : 34.236 scrub ok

> 2019-05-13 00:04:37.613094 7f38ead03700  4 rocksdb: [/build/ceph-12.2.10/src/rocksdb/db/db_impl_write.cc:684] reusing log 422507 from recycle list

>

> 2019-05-13 00:04:37.613186 7f38ead03700  4 rocksdb: [/build/ceph-12.2.10/src/rocksdb/db/db_impl_write.cc:725] [default] New memtable created with log file: #422511. Immutable memtables: 0.

>

> Any hints how to find more details about the origin of this problem?

> How can we solve that?

>

> Regards

> Marc

>

> Am 28.01.19 um 22:27 schrieb Marc Schöchlin:

>

> Hello cephers,

>

> as described - we also have the slow requests in our setup.

>

> We recently updated from ceph 12.2.4 to 12.2.10, updated Ubuntu 16.04 to the latest patchlevel (with kernel 4.15.0-43) and applied dell firmware 2.8.0.

>

> On 12.2.5 (before updating the cluster) we had in a frequency of 10min to 30minutes in the entire deepscrub-window between 8:00 PM and 6:00 AM.

> Especially between 04:00AM and 06:00 AM when when we sequentially create a rbd snapshots for every rbd image and delete a outdated snapshot (we hold 3 snapshots per rbd device).

>

> After the upgrade to 12.2.10 (and the other patches) slow requests seems to be reduced, but they still occur after the snapshot creation/deletion procedure.

> Today we changed the time of the creation/deletion procedure from 4:00 AM to 7:30PM and we experienced slow request right in the the snapshot process at 8:00PM.

>

> The slow requests only happen on a certain storage class osds (30 *

> 8GB spinners)  - i.e ssd osds do not have this problem on the same cluster The pools which use this storage class are loaded by 80% write requests.

>

> Our configuration looks like this:

> ---

> bluestore cache kv max = 2147483648

> bluestore cache kv ratio = 0.9

> bluestore cache meta ratio = 0.1

> bluestore cache size hdd = 10737418240 osd deep scrub interval =

> 2592000 osd scrub begin hour = 19 osd scrub end hour = 6 osd scrub

> load threshold = 4 osd scrub sleep = 0.3 osd max trimming pgs = 2

> ---

> We do not have so much devices in this storage class (a enhancement is

> in progress to get more iops)

>

> What can i do to decrease the impact of snaptrims to prevent slow requests?

> (i.e. reduce "osd max trimming pgs" to "1")

>

> Regards

> Marc Schöchlin

>

> Am 03.09.18 um 10:13 schrieb Marc Schöchlin:

>

> Hi,

>

> we are also experiencing this type of behavior for some weeks on our

> not so performance critical hdd pools.

> We haven't spent so much time on this problem, because there are

> currently more important tasks - but here are a few details:

>

> Running the following loop results in the following output:

>

> while true; do ceph health|grep -q HEALTH_OK || (date;  ceph health

> detail); sleep 2; done

>

> Sun Sep  2 20:59:47 CEST 2018

> HEALTH_WARN 4 slow requests are blocked > 32 sec REQUEST_SLOW 4 slow

> requests are blocked > 32 sec

>     4 ops are blocked > 32.768 sec

>     osd.43 has blocked requests > 32.768 sec Sun Sep  2 20:59:50 CEST

> 2018 HEALTH_WARN 4 slow requests are blocked > 32 sec REQUEST_SLOW 4

> slow requests are blocked > 32 sec

>     4 ops are blocked > 32.768 sec

>     osd.43 has blocked requests > 32.768 sec Sun Sep  2 20:59:52 CEST

> 2018 HEALTH_OK Sun Sep  2 21:00:28 CEST 2018 HEALTH_WARN 1 slow

> requests are blocked > 32 sec REQUEST_SLOW 1 slow requests are

> blocked > 32 sec

>     1 ops are blocked > 32.768 sec

>     osd.41 has blocked requests > 32.768 sec Sun Sep  2 21:00:31 CEST

> 2018 HEALTH_WARN 7 slow requests are blocked > 32 sec REQUEST_SLOW 7

> slow requests are blocked > 32 sec

>     7 ops are blocked > 32.768 sec

>     osds 35,41 have blocked requests > 32.768 sec Sun Sep  2 21:00:33

> CEST 2018 HEALTH_WARN 7 slow requests are blocked > 32 sec

> REQUEST_SLOW 7 slow requests are blocked > 32 sec

>     7 ops are blocked > 32.768 sec

>     osds 35,51 have blocked requests > 32.768 sec Sun Sep  2 21:00:35

> CEST 2018 HEALTH_WARN 7 slow requests are blocked > 32 sec

> REQUEST_SLOW 7 slow requests are blocked > 32 sec

>     7 ops are blocked > 32.768 sec

>     osds 35,51 have blocked requests > 32.768 sec

>

> Our details:

>

>   * system details:

>     * Ubuntu 16.04

>      * Kernel 4.13.0-39

>      * 30 * 8 TB Disk (SEAGATE/ST8000NM0075)

>      * 3* Dell Power Edge R730xd (Firmware 2.50.50.50)

>        * Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz

>        * 2*10GBITS SFP+ Network Adapters

>        * 192GB RAM

>      * Pools are using replication factor 3, 2MB object size,

>        85% write load, 1700 write IOPS/sec

>        (ops mainly between 4k and 16k size), 300 read IOPS/sec

>   * we have the impression that this appears on deepscrub/scrub activity.

>   * Ceph 12.2.5, we alread played with the osd settings OSD Settings

>     (our assumtion was that the problem is related to rocksdb compaction)

>     bluestore cache kv max = 2147483648

>     bluestore cache kv ratio = 0.9

>     bluestore cache meta ratio = 0.1

>     bluestore cache size hdd = 10737418240

>   * this type problem only appears on hdd/bluestore osds, ssd/bluestore

>     osds did never experienced that problem

>   * the system is healthy, no swapping, no high load, no errors in

> dmesg

>

> I attached a log excerpt of osd.35 - probably this is useful for

> investigating the problem is someone owns deeper bluestore knowledge.

> (slow requests appeared on Sun Sep  2 21:00:35)

>

> Regards

> Marc

>

>

> Am 02.09.2018 um 15:50 schrieb Brett Chancellor:

>

> The warnings look like this.

>

> 6 ops are blocked > 32.768 sec on osd.219

> 1 osds have slow requests

>

> On Sun, Sep 2, 2018, 8:45 AM Alfredo Deza <adeza@xxxxxxxxxx

> <mailto:adeza@xxxxxxxxxx>> wrote:

>

>     On Sat, Sep 1, 2018 at 12:45 PM, Brett Chancellor

>     <bchancellor@xxxxxxxxxxxxxx <mailto:bchancellor@xxxxxxxxxxxxxx>>

>     wrote:

>     > Hi Cephers,

>     >   I am in the process of upgrading a cluster from Filestore to

>     bluestore,

>     > but I'm concerned about frequent warnings popping up against the new

>     > bluestore devices. I'm frequently seeing messages like this,

>     although the

>     > specific osd changes, it's always one of the few hosts I've

>     converted to

>     > bluestore.

>     >

>     > 6 ops are blocked > 32.768 sec on osd.219

>     > 1 osds have slow requests

>     >

>     > I'm running 12.2.4, have any of you seen similar issues? It

>     seems as though

>     > these messages pop up more frequently when one of the bluestore

>     pgs is

>     > involved in a scrub.  I'll include my bluestore creation process

>     below, in

>     > case that might cause an issue. (sdb, sdc, sdd are SATA, sde and

>     sdf are

>     > SSD)

>

>     Would be useful to include what those warnings say. The ceph-volume

>     commands look OK to me

>

>     >

>     >

>     > ## Process used to create osds

>     > sudo ceph-disk zap /dev/sdb /dev/sdc /dev/sdd /dev/sdd /dev/sde

>     /dev/sdf

>     > sudo ceph-volume lvm zap /dev/sdb

>     > sudo ceph-volume lvm zap /dev/sdc

>     > sudo ceph-volume lvm zap /dev/sdd

>     > sudo ceph-volume lvm zap /dev/sde

>     > sudo ceph-volume lvm zap /dev/sdf

>     > sudo sgdisk -n 0:2048:+133GiB -t 0:FFFF -c 1:"ceph block.db sdb"

>     /dev/sdf

>     > sudo sgdisk -n 0:0:+133GiB -t 0:FFFF -c 2:"ceph block.db sdc"

>     /dev/sdf

>     > sudo sgdisk -n 0:0:+133GiB -t 0:FFFF -c 3:"ceph block.db sdd"

>     /dev/sdf

>     > sudo sgdisk -n 0:0:+133GiB -t 0:FFFF -c 4:"ceph block.db sde"

>     /dev/sdf

>     > sudo ceph-volume lvm create --bluestore --crush-device-class hdd

>     --data

>     > /dev/sdb --block.db /dev/sdf1

>     > sudo ceph-volume lvm create --bluestore --crush-device-class hdd

>     --data

>     > /dev/sdc --block.db /dev/sdf2

>     > sudo ceph-volume lvm create --bluestore --crush-device-class hdd

>     --data

>     > /dev/sdd --block.db /dev/sdf3

>     >

>     >

>     > _______________________________________________

>     > ceph-users mailing list

>     > ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>

>     > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

>     >

>

>

>

> _______________________________________________

> ceph-users mailing list

> ceph-users@xxxxxxxxxxxxxx

> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

>

> _______________________________________________

> ceph-users mailing list

> ceph-users@xxxxxxxxxxxxxx

> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

>

> _______________________________________________

> ceph-users mailing list

> ceph-users@xxxxxxxxxxxxxx

> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

>

> _______________________________________________

> ceph-users mailing list

> ceph-users@xxxxxxxxxxxxxx

> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

>

> _______________________________________________

> ceph-users mailing list

> ceph-users@xxxxxxxxxxxxxx

> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-- 

Jason

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com