Re: librbd 4k read/write?

"Anthony D'Atri" <anthony.datri@xxxxxxxxx> · Sun, 13 Aug 2023 11:03:06 -0400

Yep.   Remember that most Ceph clusters serve a number of simultaneous clients, so the “IO blender” effect more or less presents a random workload to drives.  Dedicated single-client node-local drives might benefit from such strategies.   But really gymnastics like this for uncertain gain serve to reinforce the fact that HDDs are a false economy.  

Have we yet established for sure that the OP’s client is or is not a VM?  Still smells a lot like iops throttling.  

> On Aug 13, 2023, at 7:51 AM, Maged Mokhtar <mmokhtar@xxxxxxxxxxx> wrote:
> 
> 
> On 12/08/2023 13:04, Marc wrote:
>>> To allow for faster linear reads and writes, please create a file,
>>> /etc/udev/rules.d/80-rbd.rules, with the following contents (assuming
>>> that the VM sees the RBD as /dev/sda):
>>> 
>>> KERNEL=="sda", ENV{DEVTYPE}=="disk", ACTION=="add|change",
>>> ATTR{bdi/read_ahead_kb}="32768"
>>> 
>>> Or test it without any udev rule like this:
>>> 
>>> bloskdev --setra 65536 /dev/sda
>>> 
>>> The difference in numbers is because one is in kilobytes and one is in
>>> 512-byte sectors.
>> 
>>> Mandatory warning: this setting can hurt other workloads.
>>> 
>> Such as what workloads? Would the results show in the average latency rbd overview? Or is it better to monitor somewhere else?
> 
> 32 MB read ahead is too excessive for the vast majority of workloads. The hdd would only do about 10 iops, so any random reads will perform very bad. I would say maybe unless you write your own rados app with full control on large sequential writes and reads then it is very risky. If you store small objects, like files on cephfs, the scrub load could kill your hdd. Even large objects, depending on how/when they were written/updated their physical extents on the disk may not be continuous even for logical sequential extents and could require several ops to read. i would not go above 1MB read ahead for hdd as it will not impact your random iops too much.
> 
> Generally if you are looking for 4k iops performance like the original post, you really should consider ssd/nvme.
> 
>  /maged
> 
> 
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@xxxxxxx
>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx