Thanks for the data Stephen. Some feedback: 1. I don't think single OSD is still there to serve 460K read iops irrespective of how many shards/threads you are running. I didn't have your NVMe data earlier :-)..But, probably for 50/60K SAS SSD iops single OSD per drive is good enough. I hope you tried even increasing the shards/threads to very high value (since you have lot of cpu left) say 40:2 or 80:1 (try one configuration with 1 thread/shard , it should reduce contention per shard) so ? Or even lower ratio like 10:2 or 20:1 ? 2. Do you have any data on disk utilization ? It will be good if we are able to understand how better the single disk utilization becomes when you are running multiple OSDs/drive. I kind of back calculate from your data that , in case of 4 OSds/drive cases each OSD is serving ~14K read iops vs ~42K read iops while having one osd/drive. So, this clearly tells that two OSDs/drive should be good enough to serve similar iops in your environment. You are able to extract ~56K iops per drive with 4 OSDs vs 42K for one OSD case. 3. The above calculation I have discarded all the cache effect , but, that's not realistic. You have total of 128 GB * 5 = 640 GB of RAM. What is the total working set of yours ? If you are having lot of cache effect in this run , 4 OSDs (4 XFS) will be having better effect than one OSD /drive. This could be a total number of OSD effect in the cluster but not so number of OSD needed to saturate a drive. 4. Also, cpu util wise, you have only 20% more cpu util while you are running 4X more OSDs. 5. BTW, worker thread calculation is incorrect , default is 5:2 , so, each osd is running with 10 worker threads and total 160 worker threads for both 4 OSD/drive and 1 osd/drive (20:2). 6. Write data is surprising compare to default shard 1 OSD case, may be you need to increase filestore op thrads since you have more data coming to filestore ? Thanks & Regards Somnath -----Original Message----- From: Blinick, Stephen L [mailto:stephen.l.blinick@xxxxxxxxx] Sent: Wednesday, November 11, 2015 12:57 PM To: ceph-devel@xxxxxxxxxxxxxxx; Mark Nelson; Samuel Just; Kyle Bader; Somnath Roy Subject: Increasing # Shards vs multi-OSDs per device Sorry about the microphone issues in the performance meeting today today. This is a followup to the 11/4 performance meeting where we discussed increasing the worker thread count in the OSD's vs making multiple OSD's (and partitions/filesystems) per device. We did the high level experiment and have some results which I threw into a ppt/pdf, and shared them here: http://www.docdroid.net/UbmvGnH/increasing-shards-vs-multiple-osds.pdf.html Doing 20-shard OSD's vs 4 OSD's per device with default 5 shards yielded about half of the performance improvement for random 4k reads. For writes performance is actually worse than just 1 OSD per device and the default # of shards. The throttles should be large enough for the 20-shard use case as they are 10x the defaults, although if you see anything we missed let us know. I had the cluster moved to Infernalis release (with JEMalloc) yesterday, so hopefully we'll have some early results on the same 5-node cluster soon. Thanks, Stephen -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html