Hi Igor, where to post the logs? Am 06.05.20 um 09:23 schrieb Stefan Priebe - Profihost AG: > Hi Igor, > > Am 05.05.20 um 16:10 schrieb Igor Fedotov: >> Hi Stefan, >> >> so (surprise!) some DB access counters show a significant difference, e.g. >> >> "kv_flush_lat": { >> "avgcount": 1423, >> "sum": 0.000906419, >> "avgtime": 0.000000636 >> }, >> "kv_sync_lat": { >> "avgcount": 1423, >> "sum": 0.712888091, >> "avgtime": 0.000500975 >> }, >> vs. >> >> "kv_flush_lat": { >> "avgcount": 1146, >> "sum": 3.346228802, >> "avgtime": 0.002919920 >> }, >> "kv_sync_lat": { >> "avgcount": 1146, >> "sum": 3.754915016, >> "avgtime": 0.003276540 >> }, >> >> Also for bluefs: >> "bytes_written_sst": 0, >> vs. >> "bytes_written_sst": 59785361, >> >> Could you please rerun these benchmark/perf counter gathering steps a couple more times and check if the difference is persistent. > > I reset all perf counters and ran the bench 10 times on each osd. > > OSD 38: > bench: wrote 12 MiB in blocks of 4 KiB in 1.22796 sec at 9.5 MiB/sec > 2.44k IOPS > bench: wrote 12 MiB in blocks of 4 KiB in 1.26407 sec at 9.3 MiB/sec > 2.37k IOPS > bench: wrote 12 MiB in blocks of 4 KiB in 1.24987 sec at 9.4 MiB/sec > 2.40k IOPS > bench: wrote 12 MiB in blocks of 4 KiB in 1.37125 sec at 8.5 MiB/sec > 2.19k IOPS > bench: wrote 12 MiB in blocks of 4 KiB in 1.25549 sec at 9.3 MiB/sec > 2.39k IOPS > bench: wrote 12 MiB in blocks of 4 KiB in 1.24358 sec at 9.4 MiB/sec > 2.41k IOPS > bench: wrote 12 MiB in blocks of 4 KiB in 1.24208 sec at 9.4 MiB/sec > 2.42k IOPS > bench: wrote 12 MiB in blocks of 4 KiB in 1.2433 sec at 9.4 MiB/sec > 2.41k IOPS > bench: wrote 12 MiB in blocks of 4 KiB in 1.26548 sec at 9.3 MiB/sec > 2.37k IOPS > bench: wrote 12 MiB in blocks of 4 KiB in 1.31509 sec at 8.9 MiB/sec > 2.28k IOPS > > kv_flush_lat.sum: 8.955978864 > kv_sync_lat.sum: 10.869536503 > bytes_written_sst: 0 > > > OSD 0: > bench: wrote 12 MiB in blocks of 4 KiB in 5.71447 sec at 2.1 MiB/sec 524 > IOPS > bench: wrote 12 MiB in blocks of 4 KiB in 6.18679 sec at 1.9 MiB/sec 484 > IOPS > bench: wrote 12 MiB in blocks of 4 KiB in 6.69068 sec at 1.8 MiB/sec 448 > IOPS > bench: wrote 12 MiB in blocks of 4 KiB in 7.06413 sec at 1.7 MiB/sec 424 > IOPS > bench: wrote 12 MiB in blocks of 4 KiB in 7.50321 sec at 1.6 MiB/sec 399 > IOPS > bench: wrote 12 MiB in blocks of 4 KiB in 6.86882 sec at 1.7 MiB/sec 436 > IOPS > bench: wrote 12 MiB in blocks of 4 KiB in 7.11702 sec at 1.6 MiB/sec 421 > IOPS > bench: wrote 12 MiB in blocks of 4 KiB in 7.10497 sec at 1.6 MiB/sec 422 > IOPS > bench: wrote 12 MiB in blocks of 4 KiB in 6.69801 sec at 1.7 MiB/sec 447 > IOPS > bench: wrote 12 MiB in blocks of 4 KiB in 7.13588 sec at 1.6 MiB/sec 420 > IOPS > kv_flush_lat.sum: 0.003866224 > kv_sync_lat.sum: 2.667407139 > bytes_written_sst: 34904457 > >> If that's particularly true for "kv_flush_lat" counter - please rerun with debug-bluefs set to 20 and collect OSD logs for both cases > > Yes it's still true for kv_flush_lat - see above. Where to upload / put > those logs? > > greets, > Stefan > >> >> Thanks, >> Igor >> >> On 5/5/2020 11:46 AM, Stefan Priebe - Profihost AG wrote: >>> Hello Igor, >>> >>> Am 30.04.20 um 15:52 schrieb Igor Fedotov: >>>> 1) reset perf counters for the specific OSD >>>> >>>> 2) run bench >>>> >>>> 3) dump perf counters. >>> This is OSD 0: >>> >>> # ceph tell osd.0 bench -f plain 12288000 4096 >>> bench: wrote 12 MiB in blocks of 4 KiB in 6.70482 sec at 1.7 MiB/sec 447 >>> IOPS >>> >>> https://pastebin.com/raw/hbKcU07g >>> >>> This is OSD 38: >>> >>> # ceph tell osd.38 bench -f plain 12288000 4096 >>> bench: wrote 12 MiB in blocks of 4 KiB in 2.01763 sec at 5.8 MiB/sec >>> 1.49k IOPS >>> >>> https://pastebin.com/raw/Tx2ckVm1 >>> >>>> Collecting disks' (both main and db) activity with iostat would be nice >>>> too. But please either increase benchmark duration or reduce iostat >>>> probe period to 0.1 or 0.05 second >>> This gives me: >>> >>> # ceph tell osd.38 bench -f plain 122880000 4096 >>> Error EINVAL: 'count' values greater than 12288000 for a block size of 4 >>> KiB, assuming 100 IOPS, for 30 seconds, can cause ill effects on osd. >>> Please adjust 'osd_bench_small_size_max_iops' with a higher value if you >>> wish to use a higher 'count'. >>> >>> Stefan >>> >>>> Thanks, >>>> >>>> Igor >>>> >>>> On 4/28/2020 8:42 PM, Stefan Priebe - Profihost AG wrote: >>>>> HI Igor, >>>>> >>>>> but the performance issue is still present even on the recreated OSD. >>>>> >>>>> # ceph tell osd.38 bench -f plain 12288000 4096 >>>>> bench: wrote 12 MiB in blocks of 4 KiB in 1.63389 sec at 7.2 MiB/sec >>>>> 1.84k IOPS >>>>> >>>>> vs. >>>>> >>>>> # ceph tell osd.10 bench -f plain 12288000 4096 >>>>> bench: wrote 12 MiB in blocks of 4 KiB in 10.7454 sec at 1.1 MiB/sec 279 >>>>> IOPS >>>>> >>>>> both baked by the same SAMSUNG SSD as block.db. >>>>> >>>>> Greets, >>>>> Stefan >>>>> >>>>> Am 28.04.20 um 19:12 schrieb Stefan Priebe - Profihost AG: >>>>>> Hi Igore, >>>>>> Am 27.04.20 um 15:03 schrieb Igor Fedotov: >>>>>>> Just left a comment at https://tracker.ceph.com/issues/44509 >>>>>>> >>>>>>> Generally bdev-new-db performs no migration, RocksDB might >>>>>>> eventually do >>>>>>> that but no guarantee it moves everything. >>>>>>> >>>>>>> One should use bluefs-bdev-migrate to do actual migration. >>>>>>> >>>>>>> And I think that's the root cause for the above ticket. >>>>>> perfect - this removed all spillover in seconds. >>>>>> >>>>>> Greets, >>>>>> Stefan >>>>>> >>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Igor >>>>>>> >>>>>>> On 4/24/2020 2:37 PM, Stefan Priebe - Profihost AG wrote: >>>>>>>> No not a standalone Wal I wanted to ask whether bdev-new-db migrated >>>>>>>> dB and Wal from hdd to ssd. >>>>>>>> >>>>>>>> Stefan >>>>>>>> >>>>>>>>> Am 24.04.2020 um 13:01 schrieb Igor Fedotov <ifedotov@xxxxxxx>: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Unless you have 3 different types of disks beyond OSD (e.g. HDD, SSD, >>>>>>>>> NVMe) standalone WAL makes no sense. >>>>>>>>> >>>>>>>>> >>>>>>>>> On 4/24/2020 1:58 PM, Stefan Priebe - Profihost AG wrote: >>>>>>>>>> Is Wal device missing? Do I need to run *bluefs-bdev-new-db and >>>>>>>>>> Wal?* >>>>>>>>>> >>>>>>>>>> Greets, >>>>>>>>>> Stefan >>>>>>>>>> >>>>>>>>>>> Am 24.04.2020 um 11:32 schrieb Stefan Priebe - Profihost AG >>>>>>>>>>> <s.priebe@xxxxxxxxxxxx>: >>>>>>>>>>> >>>>>>>>>>> Hi Igor, >>>>>>>>>>> >>>>>>>>>>> there must be a difference. I purged osd.0 and recreated it. >>>>>>>>>>> >>>>>>>>>>> Now it gives: >>>>>>>>>>> ceph tell osd.0 bench >>>>>>>>>>> { >>>>>>>>>>> "bytes_written": 1073741824, >>>>>>>>>>> "blocksize": 4194304, >>>>>>>>>>> "elapsed_sec": 8.1554735639999993, >>>>>>>>>>> "bytes_per_sec": 131659040.46819863, >>>>>>>>>>> "iops": 31.389961354303033 >>>>>>>>>>> } >>>>>>>>>>> >>>>>>>>>>> What's wrong wiht adding a block.db device later? >>>>>>>>>>> >>>>>>>>>>> Stefan >>>>>>>>>>> >>>>>>>>>>> Am 23.04.20 um 20:34 schrieb Stefan Priebe - Profihost AG: >>>>>>>>>>>> Hi, >>>>>>>>>>>> if the OSDs are idle the difference is even more worse: >>>>>>>>>>>> # ceph tell osd.0 bench >>>>>>>>>>>> { >>>>>>>>>>>> "bytes_written": 1073741824, >>>>>>>>>>>> "blocksize": 4194304, >>>>>>>>>>>> "elapsed_sec": 15.396707875000001, >>>>>>>>>>>> "bytes_per_sec": 69738403.346825853, >>>>>>>>>>>> "iops": 16.626931034761871 >>>>>>>>>>>> } >>>>>>>>>>>> # ceph tell osd.38 bench >>>>>>>>>>>> { >>>>>>>>>>>> "bytes_written": 1073741824, >>>>>>>>>>>> "blocksize": 4194304, >>>>>>>>>>>> "elapsed_sec": 6.8903985170000004, >>>>>>>>>>>> "bytes_per_sec": 155831599.77624846, >>>>>>>>>>>> "iops": 37.153148597776521 >>>>>>>>>>>> } >>>>>>>>>>>> Stefan >>>>>>>>>>>> Am 23.04.20 um 14:39 schrieb Stefan Priebe - Profihost AG: >>>>>>>>>>>>> Hi, >>>>>>>>>>>>> Am 23.04.20 um 14:06 schrieb Igor Fedotov: >>>>>>>>>>>>>> I don't recall any additional tuning to be applied to new DB >>>>>>>>>>>>>> volume. And assume the hardware is pretty the same... >>>>>>>>>>>>>> >>>>>>>>>>>>>> Do you still have any significant amount of data spilled over >>>>>>>>>>>>>> for these updated OSDs? If not I don't have any valid >>>>>>>>>>>>>> explanation for the phenomena. >>>>>>>>>>>>> just the 64k from here: >>>>>>>>>>>>> https://tracker.ceph.com/issues/44509 >>>>>>>>>>>>> >>>>>>>>>>>>>> You might want to try "ceph osd bench" to compare OSDs under >>>>>>>>>>>>>> pretty the same load. Any difference observed >>>>>>>>>>>>> Servers are the same HW. OSD Bench is: >>>>>>>>>>>>> # ceph tell osd.0 bench >>>>>>>>>>>>> { >>>>>>>>>>>>> "bytes_written": 1073741824, >>>>>>>>>>>>> "blocksize": 4194304, >>>>>>>>>>>>> "elapsed_sec": 16.091414781000001, >>>>>>>>>>>>> "bytes_per_sec": 66727620.822242722, >>>>>>>>>>>>> "iops": 15.909104543266945 >>>>>>>>>>>>> } >>>>>>>>>>>>> >>>>>>>>>>>>> # ceph tell osd.36 bench >>>>>>>>>>>>> { >>>>>>>>>>>>> "bytes_written": 1073741824, >>>>>>>>>>>>> "blocksize": 4194304, >>>>>>>>>>>>> "elapsed_sec": 10.023828538, >>>>>>>>>>>>> "bytes_per_sec": 107118933.6419194, >>>>>>>>>>>>> "iops": 25.539143953780986 >>>>>>>>>>>>> } >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> OSD 0 is a Toshiba MG07SCA12TA SAS 12G >>>>>>>>>>>>> OSD 36 is a Seagate ST12000NM0008-2H SATA 6G >>>>>>>>>>>>> >>>>>>>>>>>>> SSDs are all the same like the rest of the HW. But both drives >>>>>>>>>>>>> should give the same performance from their specs. The only other >>>>>>>>>>>>> difference is that OSD 36 was directly created with the block.db >>>>>>>>>>>>> device (Nautilus 14.2.7) and OSD 0 (14.2.8) does not. >>>>>>>>>>>>> >>>>>>>>>>>>> Stefan >>>>>>>>>>>>> >>>>>>>>>>>>>> On 4/23/2020 8:35 AM, Stefan Priebe - Profihost AG wrote: >>>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> is there anything else needed beside running: >>>>>>>>>>>>>>> ceph-bluestore-tool --path /var/lib/ceph/osd/ceph-${OSD} >>>>>>>>>>>>>>> bluefs-bdev-new-db --dev-target /dev/vgroup/lvdb-1 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I did so some weeks ago and currently i'm seeing that all osds >>>>>>>>>>>>>>> originally deployed with --block-db show 10-20% I/O waits while >>>>>>>>>>>>>>> all those got converted using ceph-bluestore-tool show 80-100% >>>>>>>>>>>>>>> I/O waits. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Also is there some tuning available to use more of the SSD? The >>>>>>>>>>>>>>> SSD (block-db) is only saturated at 0-2%. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Greets, >>>>>>>>>>>>>>> Stefan >>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>> ceph-users mailing list -- ceph-users@xxxxxxx >>>>>>>>>>>>>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx >>>> _______________________________________________ >>>> ceph-users mailing list -- ceph-users@xxxxxxx >>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx