HI Igor, but the performance issue is still present even on the recreated OSD. # ceph tell osd.38 bench -f plain 12288000 4096 bench: wrote 12 MiB in blocks of 4 KiB in 1.63389 sec at 7.2 MiB/sec 1.84k IOPS vs. # ceph tell osd.10 bench -f plain 12288000 4096 bench: wrote 12 MiB in blocks of 4 KiB in 10.7454 sec at 1.1 MiB/sec 279 IOPS both baked by the same SAMSUNG SSD as block.db. Greets, Stefan Am 28.04.20 um 19:12 schrieb Stefan Priebe - Profihost AG: > Hi Igore, > Am 27.04.20 um 15:03 schrieb Igor Fedotov: >> Just left a comment at https://tracker.ceph.com/issues/44509 >> >> Generally bdev-new-db performs no migration, RocksDB might eventually do >> that but no guarantee it moves everything. >> >> One should use bluefs-bdev-migrate to do actual migration. >> >> And I think that's the root cause for the above ticket. > > perfect - this removed all spillover in seconds. > > Greets, > Stefan > > >> Thanks, >> >> Igor >> >> On 4/24/2020 2:37 PM, Stefan Priebe - Profihost AG wrote: >>> No not a standalone Wal I wanted to ask whether bdev-new-db migrated >>> dB and Wal from hdd to ssd. >>> >>> Stefan >>> >>>> Am 24.04.2020 um 13:01 schrieb Igor Fedotov <ifedotov@xxxxxxx>: >>>> >>>> >>>> >>>> Unless you have 3 different types of disks beyond OSD (e.g. HDD, SSD, >>>> NVMe) standalone WAL makes no sense. >>>> >>>> >>>> On 4/24/2020 1:58 PM, Stefan Priebe - Profihost AG wrote: >>>>> Is Wal device missing? Do I need to run *bluefs-bdev-new-db and Wal?* >>>>> >>>>> Greets, >>>>> Stefan >>>>> >>>>>> Am 24.04.2020 um 11:32 schrieb Stefan Priebe - Profihost AG >>>>>> <s.priebe@xxxxxxxxxxxx>: >>>>>> >>>>>> Hi Igor, >>>>>> >>>>>> there must be a difference. I purged osd.0 and recreated it. >>>>>> >>>>>> Now it gives: >>>>>> ceph tell osd.0 bench >>>>>> { >>>>>> "bytes_written": 1073741824, >>>>>> "blocksize": 4194304, >>>>>> "elapsed_sec": 8.1554735639999993, >>>>>> "bytes_per_sec": 131659040.46819863, >>>>>> "iops": 31.389961354303033 >>>>>> } >>>>>> >>>>>> What's wrong wiht adding a block.db device later? >>>>>> >>>>>> Stefan >>>>>> >>>>>> Am 23.04.20 um 20:34 schrieb Stefan Priebe - Profihost AG: >>>>>>> Hi, >>>>>>> if the OSDs are idle the difference is even more worse: >>>>>>> # ceph tell osd.0 bench >>>>>>> { >>>>>>> "bytes_written": 1073741824, >>>>>>> "blocksize": 4194304, >>>>>>> "elapsed_sec": 15.396707875000001, >>>>>>> "bytes_per_sec": 69738403.346825853, >>>>>>> "iops": 16.626931034761871 >>>>>>> } >>>>>>> # ceph tell osd.38 bench >>>>>>> { >>>>>>> "bytes_written": 1073741824, >>>>>>> "blocksize": 4194304, >>>>>>> "elapsed_sec": 6.8903985170000004, >>>>>>> "bytes_per_sec": 155831599.77624846, >>>>>>> "iops": 37.153148597776521 >>>>>>> } >>>>>>> Stefan >>>>>>> Am 23.04.20 um 14:39 schrieb Stefan Priebe - Profihost AG: >>>>>>>> Hi, >>>>>>>> Am 23.04.20 um 14:06 schrieb Igor Fedotov: >>>>>>>>> I don't recall any additional tuning to be applied to new DB >>>>>>>>> volume. And assume the hardware is pretty the same... >>>>>>>>> >>>>>>>>> Do you still have any significant amount of data spilled over >>>>>>>>> for these updated OSDs? If not I don't have any valid >>>>>>>>> explanation for the phenomena. >>>>>>>> >>>>>>>> just the 64k from here: >>>>>>>> https://tracker.ceph.com/issues/44509 >>>>>>>> >>>>>>>>> You might want to try "ceph osd bench" to compare OSDs under >>>>>>>>> pretty the same load. Any difference observed >>>>>>>> >>>>>>>> Servers are the same HW. OSD Bench is: >>>>>>>> # ceph tell osd.0 bench >>>>>>>> { >>>>>>>> "bytes_written": 1073741824, >>>>>>>> "blocksize": 4194304, >>>>>>>> "elapsed_sec": 16.091414781000001, >>>>>>>> "bytes_per_sec": 66727620.822242722, >>>>>>>> "iops": 15.909104543266945 >>>>>>>> } >>>>>>>> >>>>>>>> # ceph tell osd.36 bench >>>>>>>> { >>>>>>>> "bytes_written": 1073741824, >>>>>>>> "blocksize": 4194304, >>>>>>>> "elapsed_sec": 10.023828538, >>>>>>>> "bytes_per_sec": 107118933.6419194, >>>>>>>> "iops": 25.539143953780986 >>>>>>>> } >>>>>>>> >>>>>>>> >>>>>>>> OSD 0 is a Toshiba MG07SCA12TA SAS 12G >>>>>>>> OSD 36 is a Seagate ST12000NM0008-2H SATA 6G >>>>>>>> >>>>>>>> SSDs are all the same like the rest of the HW. But both drives >>>>>>>> should give the same performance from their specs. The only other >>>>>>>> difference is that OSD 36 was directly created with the block.db >>>>>>>> device (Nautilus 14.2.7) and OSD 0 (14.2.8) does not. >>>>>>>> >>>>>>>> Stefan >>>>>>>> >>>>>>>>> >>>>>>>>> On 4/23/2020 8:35 AM, Stefan Priebe - Profihost AG wrote: >>>>>>>>>> Hello, >>>>>>>>>> >>>>>>>>>> is there anything else needed beside running: >>>>>>>>>> ceph-bluestore-tool --path /var/lib/ceph/osd/ceph-${OSD} >>>>>>>>>> bluefs-bdev-new-db --dev-target /dev/vgroup/lvdb-1 >>>>>>>>>> >>>>>>>>>> I did so some weeks ago and currently i'm seeing that all osds >>>>>>>>>> originally deployed with --block-db show 10-20% I/O waits while >>>>>>>>>> all those got converted using ceph-bluestore-tool show 80-100% >>>>>>>>>> I/O waits. >>>>>>>>>> >>>>>>>>>> Also is there some tuning available to use more of the SSD? The >>>>>>>>>> SSD (block-db) is only saturated at 0-2%. >>>>>>>>>> >>>>>>>>>> Greets, >>>>>>>>>> Stefan >>>>>>>>>> _______________________________________________ >>>>>>>>>> ceph-users mailing list -- ceph-users@xxxxxxx >>>>>>>>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx