Re: adding block.db to OSD

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello Igor,

Am 30.04.20 um 15:52 schrieb Igor Fedotov:
> 1) reset perf counters for the specific OSD
> 
> 2) run bench
> 
> 3) dump perf counters.

This is OSD 0:

# ceph tell osd.0 bench -f plain 12288000 4096
bench: wrote 12 MiB in blocks of 4 KiB in 6.70482 sec at 1.7 MiB/sec 447
IOPS

https://pastebin.com/raw/hbKcU07g

This is OSD 38:

# ceph tell osd.38 bench -f plain 12288000 4096
bench: wrote 12 MiB in blocks of 4 KiB in 2.01763 sec at 5.8 MiB/sec
1.49k IOPS

https://pastebin.com/raw/Tx2ckVm1

> Collecting disks' (both main and db) activity with iostat would be nice
> too. But please either increase benchmark duration or reduce iostat
> probe period to 0.1 or 0.05 second

This gives me:

# ceph tell osd.38 bench -f plain 122880000 4096
Error EINVAL: 'count' values greater than 12288000 for a block size of 4
KiB, assuming 100 IOPS, for 30 seconds, can cause ill effects on osd.
Please adjust 'osd_bench_small_size_max_iops' with a higher value if you
wish to use a higher 'count'.

Stefan

> 
> 
> Thanks,
> 
> Igor
> 
> On 4/28/2020 8:42 PM, Stefan Priebe - Profihost AG wrote:
>> HI Igor,
>>
>> but the performance issue is still present even on the recreated OSD.
>>
>> # ceph tell osd.38 bench -f plain 12288000 4096
>> bench: wrote 12 MiB in blocks of 4 KiB in 1.63389 sec at 7.2 MiB/sec
>> 1.84k IOPS
>>
>> vs.
>>
>> # ceph tell osd.10 bench -f plain 12288000 4096
>> bench: wrote 12 MiB in blocks of 4 KiB in 10.7454 sec at 1.1 MiB/sec 279
>> IOPS
>>
>> both baked by the same SAMSUNG SSD as block.db.
>>
>> Greets,
>> Stefan
>>
>> Am 28.04.20 um 19:12 schrieb Stefan Priebe - Profihost AG:
>>> Hi Igore,
>>> Am 27.04.20 um 15:03 schrieb Igor Fedotov:
>>>> Just left a comment at https://tracker.ceph.com/issues/44509
>>>>
>>>> Generally bdev-new-db performs no migration, RocksDB might
>>>> eventually do
>>>> that but no guarantee it moves everything.
>>>>
>>>> One should use bluefs-bdev-migrate to do actual migration.
>>>>
>>>> And I think that's the root cause for the above ticket.
>>> perfect - this removed all spillover in seconds.
>>>
>>> Greets,
>>> Stefan
>>>
>>>
>>>> Thanks,
>>>>
>>>> Igor
>>>>
>>>> On 4/24/2020 2:37 PM, Stefan Priebe - Profihost AG wrote:
>>>>> No not a standalone Wal I wanted to ask whether bdev-new-db migrated
>>>>> dB and Wal from hdd to ssd.
>>>>>
>>>>> Stefan
>>>>>
>>>>>> Am 24.04.2020 um 13:01 schrieb Igor Fedotov <ifedotov@xxxxxxx>:
>>>>>>
>>>>>> 
>>>>>>
>>>>>> Unless you have 3 different types of disks beyond OSD (e.g. HDD, SSD,
>>>>>> NVMe) standalone WAL makes no sense.
>>>>>>
>>>>>>
>>>>>> On 4/24/2020 1:58 PM, Stefan Priebe - Profihost AG wrote:
>>>>>>> Is Wal device missing? Do I need to run *bluefs-bdev-new-db and
>>>>>>> Wal?*
>>>>>>>
>>>>>>> Greets,
>>>>>>> Stefan
>>>>>>>
>>>>>>>> Am 24.04.2020 um 11:32 schrieb Stefan Priebe - Profihost AG
>>>>>>>> <s.priebe@xxxxxxxxxxxx>:
>>>>>>>>
>>>>>>>> Hi Igor,
>>>>>>>>
>>>>>>>> there must be a difference. I purged osd.0 and recreated it.
>>>>>>>>
>>>>>>>> Now it gives:
>>>>>>>> ceph tell osd.0 bench
>>>>>>>> {
>>>>>>>>     "bytes_written": 1073741824,
>>>>>>>>     "blocksize": 4194304,
>>>>>>>>     "elapsed_sec": 8.1554735639999993,
>>>>>>>>     "bytes_per_sec": 131659040.46819863,
>>>>>>>>     "iops": 31.389961354303033
>>>>>>>> }
>>>>>>>>
>>>>>>>> What's wrong wiht adding a block.db device later?
>>>>>>>>
>>>>>>>> Stefan
>>>>>>>>
>>>>>>>> Am 23.04.20 um 20:34 schrieb Stefan Priebe - Profihost AG:
>>>>>>>>> Hi,
>>>>>>>>> if the OSDs are idle the difference is even more worse:
>>>>>>>>> # ceph tell osd.0 bench
>>>>>>>>> {
>>>>>>>>>      "bytes_written": 1073741824,
>>>>>>>>>      "blocksize": 4194304,
>>>>>>>>>      "elapsed_sec": 15.396707875000001,
>>>>>>>>>      "bytes_per_sec": 69738403.346825853,
>>>>>>>>>      "iops": 16.626931034761871
>>>>>>>>> }
>>>>>>>>> # ceph tell osd.38 bench
>>>>>>>>> {
>>>>>>>>>      "bytes_written": 1073741824,
>>>>>>>>>      "blocksize": 4194304,
>>>>>>>>>      "elapsed_sec": 6.8903985170000004,
>>>>>>>>>      "bytes_per_sec": 155831599.77624846,
>>>>>>>>>      "iops": 37.153148597776521
>>>>>>>>> }
>>>>>>>>> Stefan
>>>>>>>>> Am 23.04.20 um 14:39 schrieb Stefan Priebe - Profihost AG:
>>>>>>>>>> Hi,
>>>>>>>>>> Am 23.04.20 um 14:06 schrieb Igor Fedotov:
>>>>>>>>>>> I don't recall any additional tuning to be applied to new DB
>>>>>>>>>>> volume. And assume the hardware is pretty the same...
>>>>>>>>>>>
>>>>>>>>>>> Do you still have any significant amount of data spilled over
>>>>>>>>>>> for these updated OSDs? If not I don't have any valid
>>>>>>>>>>> explanation for the phenomena.
>>>>>>>>>> just the 64k from here:
>>>>>>>>>> https://tracker.ceph.com/issues/44509
>>>>>>>>>>
>>>>>>>>>>> You might want to try "ceph osd bench" to compare OSDs under
>>>>>>>>>>> pretty the same load. Any difference observed
>>>>>>>>>> Servers are the same HW. OSD Bench is:
>>>>>>>>>> # ceph tell osd.0 bench
>>>>>>>>>> {
>>>>>>>>>>       "bytes_written": 1073741824,
>>>>>>>>>>       "blocksize": 4194304,
>>>>>>>>>>       "elapsed_sec": 16.091414781000001,
>>>>>>>>>>       "bytes_per_sec": 66727620.822242722,
>>>>>>>>>>       "iops": 15.909104543266945
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>> # ceph tell osd.36 bench
>>>>>>>>>> {
>>>>>>>>>>       "bytes_written": 1073741824,
>>>>>>>>>>       "blocksize": 4194304,
>>>>>>>>>>       "elapsed_sec": 10.023828538,
>>>>>>>>>>       "bytes_per_sec": 107118933.6419194,
>>>>>>>>>>       "iops": 25.539143953780986
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> OSD 0 is a Toshiba MG07SCA12TA SAS 12G
>>>>>>>>>> OSD 36 is a Seagate ST12000NM0008-2H SATA 6G
>>>>>>>>>>
>>>>>>>>>> SSDs are all the same like the rest of the HW. But both drives
>>>>>>>>>> should give the same performance from their specs. The only other
>>>>>>>>>> difference is that OSD 36 was directly created with the block.db
>>>>>>>>>> device (Nautilus 14.2.7) and OSD 0 (14.2.8) does not.
>>>>>>>>>>
>>>>>>>>>> Stefan
>>>>>>>>>>
>>>>>>>>>>> On 4/23/2020 8:35 AM, Stefan Priebe - Profihost AG wrote:
>>>>>>>>>>>> Hello,
>>>>>>>>>>>>
>>>>>>>>>>>> is there anything else needed beside running:
>>>>>>>>>>>> ceph-bluestore-tool --path /var/lib/ceph/osd/ceph-${OSD}
>>>>>>>>>>>> bluefs-bdev-new-db --dev-target /dev/vgroup/lvdb-1
>>>>>>>>>>>>
>>>>>>>>>>>> I did so some weeks ago and currently i'm seeing that all osds
>>>>>>>>>>>> originally deployed with --block-db show 10-20% I/O waits while
>>>>>>>>>>>> all those got converted using ceph-bluestore-tool show 80-100%
>>>>>>>>>>>> I/O waits.
>>>>>>>>>>>>
>>>>>>>>>>>> Also is there some tuning available to use more of the SSD? The
>>>>>>>>>>>> SSD (block-db) is only saturated at 0-2%.
>>>>>>>>>>>>
>>>>>>>>>>>> Greets,
>>>>>>>>>>>> Stefan
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> ceph-users mailing list -- ceph-users@xxxxxxx
>>>>>>>>>>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux