[Single OSD performance on SSD] Can't go over 3, 2K IOPS

haomaiwang@xxxxxxxxx (Haomai Wang) · Fri, 29 Aug 2014 17:17:52 +0800

On Fri, Aug 29, 2014 at 4:03 PM, Andrey Korolyov <andrey at xdel.ru> wrote:

> On Fri, Aug 29, 2014 at 10:37 AM, Somnath Roy <Somnath.Roy at sandisk.com>
> wrote:
> > Thanks Haomai !
> >
> > Here is some of the data from my setup.
> >
> >
> >
> >
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> >
> > Set up:
> >
> > --------
> >
> >
> >
> > 32 core cpu with HT enabled, 128 GB RAM, one SSD (both journal and data)
> ->
> > one OSD. 5 client m/c with 12 core cpu and each running two instances of
> > ceph_smalliobench (10 clients total). Network is 10GbE.
> >
> >
> >
> > Workload:
> >
> > -------------
> >
> >
> >
> > Small workload ? 20K objects with 4K size and io_size is also 4K RR. The
> > intent is to serve the ios from memory so that it can uncover the
> > performance problems within single OSD.
> >
> >
> >
> > Results from Firefly:
> >
> > --------------------------
> >
> >
> >
> > Single client throughput is ~14K iops, but as the number of client
> increases
> > the aggregated throughput is not increasing. 10 clients ~15K iops. ~9-10
> cpu
> > cores are used.
> >
> >
> >
> > Result with latest master:
> >
> > ------------------------------
> >
> >
> >
> > Single client is ~14K iops, but scaling as number of clients increases.
> 10
> > clients ~107K iops. ~25 cpu cores are used.
> >
> >
> >
> >
> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> >
> >
> >
> >
> >
> > More realistic workload:
> >
> > -----------------------------
> >
> > Let?s see how it is performing while > 90% of the ios are served from
> disks
> >
> > Setup:
> >
> > -------
> >
> > 40 cpu core server as a cluster node (single node cluster) with 64 GB
> RAM. 8
> > SSDs -> 8 OSDs. One similar node for monitor and rgw. Another node for
> > client running fio/vdbench. 4 rbds are configured with ?noshare? option.
> 40
> > GbE network
> >
> >
> >
> > Workload:
> >
> > ------------
> >
> >
> >
> > 8 SSDs are populated , so, 8 * 800GB = ~6.4 TB of data.  Io_size = 4K RR.
> >
> >
> >
> > Results from Firefly:
> >
> > ------------------------
> >
> >
> >
> > Aggregated output while 4 rbd clients stressing the cluster in parallel
> is
> > ~20-25K IOPS , cpu cores used ~8-10 cores (may be less can?t remember
> > precisely)
>

Good job! I would like to perform it later.

> >
> >
> >
> > Results from latest master:
> >
> > --------------------------------
> >
> >
> >
> > Aggregated output while 4 rbd clients stressing the cluster in parallel
> is
> > ~120K IOPS , cpu is 7% idle i.e  ~37-38 cpu cores.
> >
> >
> >
> > Hope this helps.
> >
> >
> >
>
> Thanks Roy, the results are very promising!
>
> Just two moments - are numbers from above related to the HT cores or
> you renormalized the result for real ones? And what about percentage
> of I/O time/utilization in this test was (if you measured this ones)?
>

-- 

Best Regards,

Wheat
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140829/02a172d5/attachment.htm>