Re: adding block.db to OSD

Stefan Priebe - Profihost AG <s.priebe@xxxxxxxxxxxx> · Fri, 24 Apr 2020 20:13:52 +0200

Hi Igor,

could it be the fact that there are those 64kb spilled over metadata i
can't get away?

Stefan

Am 24.04.20 um 13:08 schrieb Igor Fedotov:
> Hi Stefan,
> 
> that's not 100% pure experiment. Fresh OSD might be faster by itself.
> E.g. due to lack of space fragmentation and/or empty lookup tables.
> 
> You might want to recreate OSD.0 without DB and attach DB manually. Then
> benchmark resulting OSD.
> 
> Different experiment if you have another slow OSD with recently added DB
> would be to:
> 
> Compare benchmark results for both bitmap and stupid allocators for this
> specific OSD. I.e. benchmark it as-is then change
> bluestore_allocator/bluefs_allocator to stupid and benchmark again.
> 
> 
> And just in case - I presume all the benchmark results are persistent,
> i.e. you can see the same results for multiple runs.
> 
> 
> Thanks,
> 
> Igor
> 
> 
> 
> On 4/24/2020 12:32 PM, Stefan Priebe - Profihost AG wrote:
>> Hi Igor,
>>
>> there must be a difference. I purged osd.0 and recreated it.
>>
>> Now it gives:
>> ceph tell osd.0 bench
>> {
>>     "bytes_written": 1073741824,
>>     "blocksize": 4194304,
>>     "elapsed_sec": 8.1554735639999993,
>>     "bytes_per_sec": 131659040.46819863,
>>     "iops": 31.389961354303033
>> }
>>
>> What's wrong wiht adding a block.db device later?
>>
>> Stefan
>>
>> Am 23.04.20 um 20:34 schrieb Stefan Priebe - Profihost AG:
>>> Hi,
>>>
>>> if the OSDs are idle the difference is even more worse:
>>>
>>> # ceph tell osd.0 bench
>>> {
>>>      "bytes_written": 1073741824,
>>>      "blocksize": 4194304,
>>>      "elapsed_sec": 15.396707875000001,
>>>      "bytes_per_sec": 69738403.346825853,
>>>      "iops": 16.626931034761871
>>> }
>>>
>>> # ceph tell osd.38 bench
>>> {
>>>      "bytes_written": 1073741824,
>>>      "blocksize": 4194304,
>>>      "elapsed_sec": 6.8903985170000004,
>>>      "bytes_per_sec": 155831599.77624846,
>>>      "iops": 37.153148597776521
>>> }
>>>
>>> Stefan
>>>
>>> Am 23.04.20 um 14:39 schrieb Stefan Priebe - Profihost AG:
>>>> Hi,
>>>> Am 23.04.20 um 14:06 schrieb Igor Fedotov:
>>>>> I don't recall any additional tuning to be applied to new DB
>>>>> volume. And assume the hardware is pretty the same...
>>>>>
>>>>> Do you still have any significant amount of data spilled over for
>>>>> these updated OSDs? If not I don't have any valid explanation for
>>>>> the phenomena.
>>>>
>>>> just the 64k from here:
>>>> https://tracker.ceph.com/issues/44509
>>>>
>>>>> You might want to try "ceph osd bench" to compare OSDs under pretty
>>>>> the same load. Any difference observed
>>>>
>>>> Servers are the same HW. OSD Bench is:
>>>> # ceph tell osd.0 bench
>>>> {
>>>>      "bytes_written": 1073741824,
>>>>      "blocksize": 4194304,
>>>>      "elapsed_sec": 16.091414781000001,
>>>>      "bytes_per_sec": 66727620.822242722,
>>>>      "iops": 15.909104543266945
>>>> }
>>>>
>>>> # ceph tell osd.36 bench
>>>> {
>>>>      "bytes_written": 1073741824,
>>>>      "blocksize": 4194304,
>>>>      "elapsed_sec": 10.023828538,
>>>>      "bytes_per_sec": 107118933.6419194,
>>>>      "iops": 25.539143953780986
>>>> }
>>>>
>>>>
>>>> OSD 0 is a Toshiba MG07SCA12TA SAS 12G
>>>> OSD 36 is a Seagate ST12000NM0008-2H SATA 6G
>>>>
>>>> SSDs are all the same like the rest of the HW. But both drives
>>>> should give the same performance from their specs. The only other
>>>> difference is that OSD 36 was directly created with the block.db
>>>> device (Nautilus 14.2.7) and OSD 0 (14.2.8) does not.
>>>>
>>>> Stefan
>>>>
>>>>>
>>>>> On 4/23/2020 8:35 AM, Stefan Priebe - Profihost AG wrote:
>>>>>> Hello,
>>>>>>
>>>>>> is there anything else needed beside running:
>>>>>> ceph-bluestore-tool --path /var/lib/ceph/osd/ceph-${OSD}
>>>>>> bluefs-bdev-new-db --dev-target /dev/vgroup/lvdb-1
>>>>>>
>>>>>> I did so some weeks ago and currently i'm seeing that all osds
>>>>>> originally deployed with --block-db show 10-20% I/O waits while
>>>>>> all those got converted using ceph-bluestore-tool show 80-100% I/O
>>>>>> waits.
>>>>>>
>>>>>> Also is there some tuning available to use more of the SSD? The
>>>>>> SSD (block-db) is only saturated at 0-2%.
>>>>>>
>>>>>> Greets,
>>>>>> Stefan
>>>>>> _______________________________________________
>>>>>> ceph-users mailing list -- ceph-users@xxxxxxx
>>>>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx