On Fri, Aug 29, 2014 at 10:37 AM, Somnath Roy <Somnath.Roy at sandisk.com> wrote: > Thanks Haomai ! > > Here is some of the data from my setup. > > > > ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ > > Set up: > > -------- > > > > 32 core cpu with HT enabled, 128 GB RAM, one SSD (both journal and data) -> > one OSD. 5 client m/c with 12 core cpu and each running two instances of > ceph_smalliobench (10 clients total). Network is 10GbE. > > > > Workload: > > ------------- > > > > Small workload ? 20K objects with 4K size and io_size is also 4K RR. The > intent is to serve the ios from memory so that it can uncover the > performance problems within single OSD. > > > > Results from Firefly: > > -------------------------- > > > > Single client throughput is ~14K iops, but as the number of client increases > the aggregated throughput is not increasing. 10 clients ~15K iops. ~9-10 cpu > cores are used. > > > > Result with latest master: > > ------------------------------ > > > > Single client is ~14K iops, but scaling as number of clients increases. 10 > clients ~107K iops. ~25 cpu cores are used. > > > > -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > > > > > > More realistic workload: > > ----------------------------- > > Let?s see how it is performing while > 90% of the ios are served from disks > > Setup: > > ------- > > 40 cpu core server as a cluster node (single node cluster) with 64 GB RAM. 8 > SSDs -> 8 OSDs. One similar node for monitor and rgw. Another node for > client running fio/vdbench. 4 rbds are configured with ?noshare? option. 40 > GbE network > > > > Workload: > > ------------ > > > > 8 SSDs are populated , so, 8 * 800GB = ~6.4 TB of data. Io_size = 4K RR. > > > > Results from Firefly: > > ------------------------ > > > > Aggregated output while 4 rbd clients stressing the cluster in parallel is > ~20-25K IOPS , cpu cores used ~8-10 cores (may be less can?t remember > precisely) > > > > Results from latest master: > > -------------------------------- > > > > Aggregated output while 4 rbd clients stressing the cluster in parallel is > ~120K IOPS , cpu is 7% idle i.e ~37-38 cpu cores. > > > > Hope this helps. > > > Thanks Roy, the results are very promising! Just two moments - are numbers from above related to the HT cores or you renormalized the result for real ones? And what about percentage of I/O time/utilization in this test was (if you measured this ones)?