Hello cephers, we have a few systems which utilize a rbd-bd map/mount to get
access to a rbd volume. Unfortunately the rbd-nbd device of a system crashes three
mondays in series at ~00:00 when the systemd fstrim timer executes
"fstrim -av". After that the device constantly reports io errors every time a
access to the filesystem happens. Manual 30 minute stresstests using the following fio command, did
not produce any problems on client side It seems that others also experienced this problem:
https://ceph-users.ceph.narkive.com/2FIfyx1U/rbd-nbd-timeout-and-crash Do you have suggestions how to analyze/solve the situation? Marc
The client kernel throws messages like this: May 19 23:59:01 int-nfs-001 CRON[836295]: (root) CMD (command
-v debian-sa1 > /dev/null && debian-sa1 60 2) Ceph throws messages like this: The parameters of our environment:
Regards Am 13.05.19 um 07:40 schrieb EDH -
Manuel Rios Fernandez:
Hi Marc, Try to compact OSD with slow request ceph tell osd.[ID] compact This will make the OSD offline for some seconds(SSD) to minutes(HDD) and perform a compact of OMAP database. Regards, -----Mensaje original----- De: ceph-users <ceph-users-bounces@xxxxxxxxxxxxxx> En nombre de Marc Schöchlin Enviado el: lunes, 13 de mayo de 2019 6:59 Para: ceph-users@xxxxxxxxxxxxxx Asunto: Re: Slow requests from bluestore osds Hello cephers, one week ago we replaced the bluestore cache size by "osd memory target" and removed the detail memory settings. This storage class now runs 42*8GB spinners with a permanent write workload of 2000-3000 write IOPS, and 1200-8000 read IOPS. Out new setup is now: (12.2.10 on Ubuntu 16.04) [osd] osd deep scrub interval = 2592000 osd scrub begin hour = 19 osd scrub end hour = 6 osd scrub load threshold = 6 osd scrub sleep = 0.3 osd snap trim sleep = 0.4 pg max concurrent snap trims = 1 [osd.51] osd memory target = 8589934592 ... After that (restarting the entire cluster with these settings) we were very happy to not seeany slow request for 7 days. Unfortunately this night the slow requests returned on one osd without any known change of the workload of the last 14 days (according to our detailed monitoring) 2019-05-12 22:00:00.000117 mon.ceph-mon-s43 [INF] overall HEALTH_OK 2019-05-12 23:00:00.000130 mon.ceph-mon-s43 [INF] overall HEALTH_OK 2019-05-13 00:00:00.000129 mon.ceph-mon-s43 [INF] overall HEALTH_OK 2019-05-13 00:00:44.069793 mon.ceph-mon-s43 [WRN] Health check failed: 416 slow requests are blocked > 32 sec. Implicated osds 51 (REQUEST_SLOW) 2019-05-13 00:00:50.151190 mon.ceph-mon-s43 [WRN] Health check update: 439 slow requests are blocked > 32 sec. Implicated osds 51 (REQUEST_SLOW) 2019-05-13 00:00:59.750398 mon.ceph-mon-s43 [WRN] Health check update: 452 slow requests are blocked > 32 sec. Implicated osds 51 (REQUEST_SLOW) 2019-05-13 00:01:04.750697 mon.ceph-mon-s43 [WRN] Health check update: 283 slow requests are blocked > 32 sec. Implicated osds 51 (REQUEST_SLOW) 2019-05-13 00:01:10.419801 mon.ceph-mon-s43 [WRN] Health check update: 230 slow requests are blocked > 32 sec. Implicated osds 51 (REQUEST_SLOW) 2019-05-13 00:01:19.751516 mon.ceph-mon-s43 [WRN] Health check update: 362 slow requests are blocked > 32 sec. Implicated osds 51 (REQUEST_SLOW) 2019-05-13 00:01:24.751822 mon.ceph-mon-s43 [WRN] Health check update: 324 slow requests are blocked > 32 sec. Implicated osds 51 (REQUEST_SLOW) 2019-05-13 00:01:30.675160 mon.ceph-mon-s43 [WRN] Health check update: 341 slow requests are blocked > 32 sec. Implicated osds 51 (REQUEST_SLOW) 2019-05-13 00:01:38.759012 mon.ceph-mon-s43 [WRN] Health check update: 390 slow requests are blocked > 32 sec. Implicated osds 51 (REQUEST_SLOW) 2019-05-13 00:01:44.858392 mon.ceph-mon-s43 [WRN] Health check update: 366 slow requests are blocked > 32 sec. Implicated osds 51 (REQUEST_SLOW) 2019-05-13 00:01:54.753388 mon.ceph-mon-s43 [WRN] Health check update: 352 slow requests are blocked > 32 sec. Implicated osds 51 (REQUEST_SLOW) 2019-05-13 00:01:59.045220 mon.ceph-mon-s43 [INF] Health check cleared: REQUEST_SLOW (was: 168 slow requests are blocked > 32 sec. Implicated osds 51) 2019-05-13 00:01:59.045257 mon.ceph-mon-s43 [INF] Cluster is now healthy 2019-05-13 01:00:00.000114 mon.ceph-mon-s43 [INF] overall HEALTH_OK 2019-05-13 02:00:00.000130 mon.ceph-mon-s43 [INF] overall HEALTH_OK The output of a "ceph health detail" loop at the time the problem occurred: Mon May 13 00:01:27 CEST 2019 HEALTH_WARN 324 slow requests are blocked > 32 sec. Implicated osds 51 REQUEST_SLOW 324 slow requests are blocked > 32 sec. Implicated osds 51 324 ops are blocked > 32.768 sec osd.51 has blocked requests > 32.768 sec The logfile of the OSD: 2019-05-12 23:57:28.767463 7f38da4e2700 4 rocksdb: (Original Log Time 2019/05/12-23:57:28.767419) [/build/ceph-12.2.10/src/rocksdb/db/db_impl_compaction_flush.cc:132] [default] Level summary: base level 1 max b ytes base 268435456 files[2 4 21 122 0 0 0] max score 0.94 2019-05-12 23:57:28.767511 7f38da4e2700 4 rocksdb: [/build/ceph-12.2.10/src/rocksdb/db/db_impl_files.cc:388] [JOB 2991] Try to delete WAL files size 256700142, prev total WAL file size 257271487, number of live WAL files 2. 2019-05-12 23:58:07.816376 7f38ddce9700 0 log_channel(cluster) log [DBG] : 34.ac scrub ok 2019-05-12 23:59:54.070025 7f38de4ea700 0 log_channel(cluster) log [DBG] : 34.236 scrub starts 2019-05-13 00:02:21.818689 7f38de4ea700 0 log_channel(cluster) log [DBG] : 34.236 scrub ok 2019-05-13 00:04:37.613094 7f38ead03700 4 rocksdb: [/build/ceph-12.2.10/src/rocksdb/db/db_impl_write.cc:684] reusing log 422507 from recycle list 2019-05-13 00:04:37.613186 7f38ead03700 4 rocksdb: [/build/ceph-12.2.10/src/rocksdb/db/db_impl_write.cc:725] [default] New memtable created with log file: #422511. Immutable memtables: 0. Any hints how to find more details about the origin of this problem? How can we solve that? Regards Marc Am 28.01.19 um 22:27 schrieb Marc Schöchlin:Hello cephers, as described - we also have the slow requests in our setup. We recently updated from ceph 12.2.4 to 12.2.10, updated Ubuntu 16.04 to the latest patchlevel (with kernel 4.15.0-43) and applied dell firmware 2.8.0. On 12.2.5 (before updating the cluster) we had in a frequency of 10min to 30minutes in the entire deepscrub-window between 8:00 PM and 6:00 AM. Especially between 04:00AM and 06:00 AM when when we sequentially create a rbd snapshots for every rbd image and delete a outdated snapshot (we hold 3 snapshots per rbd device). After the upgrade to 12.2.10 (and the other patches) slow requests seems to be reduced, but they still occur after the snapshot creation/deletion procedure. Today we changed the time of the creation/deletion procedure from 4:00 AM to 7:30PM and we experienced slow request right in the the snapshot process at 8:00PM. The slow requests only happen on a certain storage class osds (30 * 8GB spinners) - i.e ssd osds do not have this problem on the same cluster The pools which use this storage class are loaded by 80% write requests. Our configuration looks like this: --- bluestore cache kv max = 2147483648 bluestore cache kv ratio = 0.9 bluestore cache meta ratio = 0.1 bluestore cache size hdd = 10737418240 osd deep scrub interval = 2592000 osd scrub begin hour = 19 osd scrub end hour = 6 osd scrub load threshold = 4 osd scrub sleep = 0.3 osd max trimming pgs = 2 --- We do not have so much devices in this storage class (a enhancement is in progress to get more iops) What can i do to decrease the impact of snaptrims to prevent slow requests? (i.e. reduce "osd max trimming pgs" to "1") Regards Marc Schöchlin Am 03.09.18 um 10:13 schrieb Marc Schöchlin:Hi, we are also experiencing this type of behavior for some weeks on our not so performance critical hdd pools. We haven't spent so much time on this problem, because there are currently more important tasks - but here are a few details: Running the following loop results in the following output: while true; do ceph health|grep -q HEALTH_OK || (date; ceph health detail); sleep 2; done Sun Sep 2 20:59:47 CEST 2018 HEALTH_WARN 4 slow requests are blocked > 32 sec REQUEST_SLOW 4 slow requests are blocked > 32 sec 4 ops are blocked > 32.768 sec osd.43 has blocked requests > 32.768 sec Sun Sep 2 20:59:50 CEST 2018 HEALTH_WARN 4 slow requests are blocked > 32 sec REQUEST_SLOW 4 slow requests are blocked > 32 sec 4 ops are blocked > 32.768 sec osd.43 has blocked requests > 32.768 sec Sun Sep 2 20:59:52 CEST 2018 HEALTH_OK Sun Sep 2 21:00:28 CEST 2018 HEALTH_WARN 1 slow requests are blocked > 32 sec REQUEST_SLOW 1 slow requests are blocked > 32 sec 1 ops are blocked > 32.768 sec osd.41 has blocked requests > 32.768 sec Sun Sep 2 21:00:31 CEST 2018 HEALTH_WARN 7 slow requests are blocked > 32 sec REQUEST_SLOW 7 slow requests are blocked > 32 sec 7 ops are blocked > 32.768 sec osds 35,41 have blocked requests > 32.768 sec Sun Sep 2 21:00:33 CEST 2018 HEALTH_WARN 7 slow requests are blocked > 32 sec REQUEST_SLOW 7 slow requests are blocked > 32 sec 7 ops are blocked > 32.768 sec osds 35,51 have blocked requests > 32.768 sec Sun Sep 2 21:00:35 CEST 2018 HEALTH_WARN 7 slow requests are blocked > 32 sec REQUEST_SLOW 7 slow requests are blocked > 32 sec 7 ops are blocked > 32.768 sec osds 35,51 have blocked requests > 32.768 sec Our details: * system details: * Ubuntu 16.04 * Kernel 4.13.0-39 * 30 * 8 TB Disk (SEAGATE/ST8000NM0075) * 3* Dell Power Edge R730xd (Firmware 2.50.50.50) * Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz * 2*10GBITS SFP+ Network Adapters * 192GB RAM * Pools are using replication factor 3, 2MB object size, 85% write load, 1700 write IOPS/sec (ops mainly between 4k and 16k size), 300 read IOPS/sec * we have the impression that this appears on deepscrub/scrub activity. * Ceph 12.2.5, we alread played with the osd settings OSD Settings (our assumtion was that the problem is related to rocksdb compaction) bluestore cache kv max = 2147483648 bluestore cache kv ratio = 0.9 bluestore cache meta ratio = 0.1 bluestore cache size hdd = 10737418240 * this type problem only appears on hdd/bluestore osds, ssd/bluestore osds did never experienced that problem * the system is healthy, no swapping, no high load, no errors in dmesg I attached a log excerpt of osd.35 - probably this is useful for investigating the problem is someone owns deeper bluestore knowledge. (slow requests appeared on Sun Sep 2 21:00:35) Regards Marc Am 02.09.2018 um 15:50 schrieb Brett Chancellor:The warnings look like this. 6 ops are blocked > 32.768 sec on osd.219 1 osds have slow requests On Sun, Sep 2, 2018, 8:45 AM Alfredo Deza <adeza@xxxxxxxxxx <mailto:adeza@xxxxxxxxxx>> wrote: On Sat, Sep 1, 2018 at 12:45 PM, Brett Chancellor <bchancellor@xxxxxxxxxxxxxx <mailto:bchancellor@xxxxxxxxxxxxxx>> wrote: > Hi Cephers, > I am in the process of upgrading a cluster from Filestore to bluestore, > but I'm concerned about frequent warnings popping up against the new > bluestore devices. I'm frequently seeing messages like this, although the > specific osd changes, it's always one of the few hosts I've converted to > bluestore. > > 6 ops are blocked > 32.768 sec on osd.219 > 1 osds have slow requests > > I'm running 12.2.4, have any of you seen similar issues? It seems as though > these messages pop up more frequently when one of the bluestore pgs is > involved in a scrub. I'll include my bluestore creation process below, in > case that might cause an issue. (sdb, sdc, sdd are SATA, sde and sdf are > SSD) Would be useful to include what those warnings say. The ceph-volume commands look OK to me > > > ## Process used to create osds > sudo ceph-disk zap /dev/sdb /dev/sdc /dev/sdd /dev/sdd /dev/sde /dev/sdf > sudo ceph-volume lvm zap /dev/sdb > sudo ceph-volume lvm zap /dev/sdc > sudo ceph-volume lvm zap /dev/sdd > sudo ceph-volume lvm zap /dev/sde > sudo ceph-volume lvm zap /dev/sdf > sudo sgdisk -n 0:2048:+133GiB -t 0:FFFF -c 1:"ceph block.db sdb" /dev/sdf > sudo sgdisk -n 0:0:+133GiB -t 0:FFFF -c 2:"ceph block.db sdc" /dev/sdf > sudo sgdisk -n 0:0:+133GiB -t 0:FFFF -c 3:"ceph block.db sdd" /dev/sdf > sudo sgdisk -n 0:0:+133GiB -t 0:FFFF -c 4:"ceph block.db sde" /dev/sdf > sudo ceph-volume lvm create --bluestore --crush-device-class hdd --data > /dev/sdb --block.db /dev/sdf1 > sudo ceph-volume lvm create --bluestore --crush-device-class hdd --data > /dev/sdc --block.db /dev/sdf2 > sudo ceph-volume lvm create --bluestore --crush-device-class hdd --data > /dev/sdd --block.db /dev/sdf3 > > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com |
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com