Re: Unexpected slow read for HDD cluster (good write speed)

Arvid Picciani <aep@xxxxxxxx> · Tue, 28 Mar 2023 08:51:53 +0200

Yes, during my last adventure of trying to get any reasonable
performance out of ceph, i realized my testing methodology was wrong.
Both the kernel client and qemu have queues everywhere that make the
numbers hard to understand.

fio has rbd support, which gives more useful values.

https://subscription.packtpub.com/book/cloud-&-networking/9781784393502/10/ch10lvl1sec112/benchmarking-ceph-rbd-using-fio

frustratingly, much lower ones, showing just how slow ceph actually is.

On Sat, Mar 18, 2023 at 8:59 PM Rafael Weingartner
<work.ceph.user.mailing@xxxxxxxxx> wrote:
>
> Hello guys!
>
> I would like to ask if somebody has already experienced a similar
> situation. We have a new cluster with 5 nodes with the following setup:
>
>    - 128 GB of RAM
>    - 2 cpus Intel(R) Intel Xeon Silver 4210R
>    - 1 NVME of 2 TB for the rocks DB caching
>    - 5 HDDs of 14TB
>    - 1 NIC dual port of 25GiB in BOND mode.
>
>
> We are starting with a single dual port NIC (the bond has 50GiB in total),
> the design has been prepared so a new NIC can be added, and a new BOND can
> be created, where we intend to offload the cluster network. Therefore,
> logically speaking, we already configured different VLANs and networks for
> public and cluster traffic of Ceph.
>
>
> We are using Ubuntu 20.04 with Ceph Octopus. It is a standard deployment
> that we are used to. During our initial validations and evaluations of the
> cluster, we are reaching write speeds between 250-300MB/s, which would be
> the ballpark for this kind of setup for HDDs with the NVME as Rocks.db
> cache (in our experience). However, the issue is the reading process. While
> reading, we barely hit the mark of 100MB/s; we would expect at least
> something similar to the write speed. These tests are being performed in a
> pool with a replication factor of 3.
>
>
> We have already checked the disks, and they all seem to be reading just
> fine. The network does not seem to be the bottleneck either (checked with
> atop while reading/writing to the cluster).
>
>
> Have you guys ever encountered similar situations? Do you have any tips for
> us to proceed with the troubleshooting?
>
>
> We suspect that we are missing some small tuning detail, which is affecting
> the read performance only, but so far we could not pinpoint it. Any help
> would be much appreciated :)
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx

-- 
+4916093821054
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx