[Single OSD performance on SSD] Can't go over 3, 2K IOPS

andrey@xxxxxxx (Andrey Korolyov) · Fri, 29 Aug 2014 12:03:03 +0400

On Fri, Aug 29, 2014 at 10:37 AM, Somnath Roy <Somnath.Roy at sandisk.com> wrote:
> Thanks Haomai !
>
> Here is some of the data from my setup.
>
>
>
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Set up:
>
> --------
>
>
>
> 32 core cpu with HT enabled, 128 GB RAM, one SSD (both journal and data) ->
> one OSD. 5 client m/c with 12 core cpu and each running two instances of
> ceph_smalliobench (10 clients total). Network is 10GbE.
>
>
>
> Workload:
>
> -------------
>
>
>
> Small workload ? 20K objects with 4K size and io_size is also 4K RR. The
> intent is to serve the ios from memory so that it can uncover the
> performance problems within single OSD.
>
>
>
> Results from Firefly:
>
> --------------------------
>
>
>
> Single client throughput is ~14K iops, but as the number of client increases
> the aggregated throughput is not increasing. 10 clients ~15K iops. ~9-10 cpu
> cores are used.
>
>
>
> Result with latest master:
>
> ------------------------------
>
>
>
> Single client is ~14K iops, but scaling as number of clients increases. 10
> clients ~107K iops. ~25 cpu cores are used.
>
>
>
> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>
>
>
>
> More realistic workload:
>
> -----------------------------
>
> Let?s see how it is performing while > 90% of the ios are served from disks
>
> Setup:
>
> -------
>
> 40 cpu core server as a cluster node (single node cluster) with 64 GB RAM. 8
> SSDs -> 8 OSDs. One similar node for monitor and rgw. Another node for
> client running fio/vdbench. 4 rbds are configured with ?noshare? option. 40
> GbE network
>
>
>
> Workload:
>
> ------------
>
>
>
> 8 SSDs are populated , so, 8 * 800GB = ~6.4 TB of data.  Io_size = 4K RR.
>
>
>
> Results from Firefly:
>
> ------------------------
>
>
>
> Aggregated output while 4 rbd clients stressing the cluster in parallel is
> ~20-25K IOPS , cpu cores used ~8-10 cores (may be less can?t remember
> precisely)
>
>
>
> Results from latest master:
>
> --------------------------------
>
>
>
> Aggregated output while 4 rbd clients stressing the cluster in parallel is
> ~120K IOPS , cpu is 7% idle i.e  ~37-38 cpu cores.
>
>
>
> Hope this helps.
>
>
>

Thanks Roy, the results are very promising!

Just two moments - are numbers from above related to the HT cores or
you renormalized the result for real ones? And what about percentage
of I/O time/utilization in this test was (if you measured this ones)?