Re: Benchmark WAL/DB on SSD and HDD for RGW RBD CephFS

"huxiaoyu@xxxxxxxxxxxx" <huxiaoyu@xxxxxxxxxxxx> · Fri, 18 Sep 2020 09:14:59 +0200

Dear Maged,

Do you mean dm-writecache is better than B-cache in terms of small IO performance. By how much? Could you please share us a bit more details?

thanks in advance,

Samuel

huxiaoyu@xxxxxxxxxxxx

From: Maged Mokhtar
Date: 2020-09-18 02:12
To: ceph-users
Subject:  Re: Benchmark WAL/DB on SSD and HDD for RGW RBD CephFS

On 17/09/2020 19:21, vitalif@xxxxxxxxxx wrote:
>   RBD in fact doesn't benefit much from the WAL/DB partition alone because Bluestore never does more writes per second than HDD can do on average (it flushes every 32 writes to the HDD). For RBD, the best thing is bcache.

rbd will benefit: for each write data iop, there could be a metada read 
iop (unless it is cached) + a write iop, so taking these extra metadata 
iops away from the hdd will make a difference for small block sizes. 
Even for data flushes (not sure if it is 32 or 64) if the data is not 
totally random, the io scheduler for the hdd (cfq or deadline) will 
either merge blocks or order them in a way which can sustain higher 
client iops.

we did test dm-cache, bcache and dm-writecache, we found the later to be 
much better.

/Maged

>
> Just try to fill up your OSDs up to a decent point to see the difference because a lot of objects means a lot of metadata and when there's a lot of metadata it stops fitting in cache. The performance and the performance difference will also depend on whether your HDDs have internal SSD/media cache (a lot of them do even if you're unaware of it).
>
> +1 for hsbench, just be careful and use my repo https://github.com/vitalif/hsbench because the original has at least 2 bugs for now:
> 1) it only reads first 64KB when benchmarking GETs
> 2) it reads objects sequentially instead of reading them randomly
>
> The first one actually has a fix waiting to be merged in a someone's pull request, the second is my fix, I can submit a PR later.
>
>> Yes, I agree that there are many knob for fine tuning Ceph performance.
>> The problem is we don't have data which workload that benefit most from
>> WAL/DB in SSD vs in same spinning drive and by how much. Does it really
>> help in a cluster that mostly for object storage/RGW? Or may be just
>> block storage/RBD workload that benefit most?
>>
>> IMHO, I think we need some cost-benefit analysis from this because the
>> cost placing WAL/DB in SSD is quite noticeable (multiple OSD would be
>> fail when SSD fail and capacity reduced).
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx