>>Could you describe more about 2x70000 iops? >>So you mean 8 OSD each backend with SSD can achieve with 14w iops? It's a small rbd (10G), so mostly read hit the buffer cache. But yes, it's able to deliver 140000iops with 8 osd. (I check also stats in ceph cluster to be sure). (and I'm not cpu bound on osd nodes) >> 2014-10-31 05:58:34.231037 mon.0 [INF] pgmap v7109: 1264 pgs: 1264 active+clean; 165 GB data, 109 GB used, 6226 GB / 6335 GB avail; 560 MB/s rd, 140 kop/s here the ceph.conf of osd nodes [global] fsid = c29f4643-9577-4671-ae25-59ad14550aba auth_cluster_required = none auth_service_required = none auth_client_required = none filestore_xattr_use_omap = true debug lockdep = 0/0 debug context = 0/0 debug crush = 0/0 debug buffer = 0/0 debug timer = 0/0 debug journaler = 0/0 debug osd = 0/0 debug optracker = 0/0 debug objclass = 0/0 debug filestore = 0/0 debug journal = 0/0 debug ms = 0/0 debug monc = 0/0 debug tp = 0/0 debug auth = 0/0 debug finisher = 0/0 debug heartbeatmap = 0/0 debug perfcounter = 0/0 debug asok = 0/0 debug throttle = 0/0 osd_op_threads = 5 filestore_op_threads = 4 osd_op_num_threads_per_shard = 1 osd_op_num_shards = 25 filestore_fd_cache_size = 64 filestore_fd_cache_shards = 32 osd_enable_op_tracker = false >>is it read or write? could you give fio options? random read 4K Here the fio config. [global] ioengine=aio invalidate=1 rw=randread bs=4K direct=1 numjobs=1 group_reporting=1 size=10G [test1] iodepth=64 filename=/dev/rbd/test/test On 1 client node, I can't reach more than 50000iops with 6osd or 70000iops with 8 osd. (I had try to increasing numjobs to have more fio process or with 2 differents rbd volume at the same time, but performance is the same). >> 2014-10-31 05:57:30.078348 mon.0 [INF] pgmap v7070: 1264 pgs: 1264 active+clean; 165 GB data, 109 GB used, 6226 GB / 6335 GB avail; 290 MB/s rd, 74572 op/s But If I launch same fio test on another client node, I can reach same 70000iops at the same time. >> 2014-10-31 05:58:34.231037 mon.0 [INF] pgmap v7109: 1264 pgs: 1264 active+clean; 165 GB data, 109 GB used, 6226 GB / 6335 GB avail; 560 MB/s rd, 140 kop/s ----- Mail original ----- De: "Haomai Wang" <haomaiwang@xxxxxxxxx> À: "Alexandre DERUMIER" <aderumier@xxxxxxxxx> Cc: "Sage Weil" <sage@xxxxxxxxxxxx>, "Christoph Hellwig" <hch@xxxxxxxxxxxxx>, "Ceph Devel" <ceph-devel@xxxxxxxxxxxxxxx> Envoyé: Jeudi 30 Octobre 2014 18:05:26 Objet: Re: krbd blk-mq support ? Could you describe more about 2x70000 iops? So you mean 8 OSD each backend with SSD can achieve with 14w iops? is it read or write? could you give fio options? On Fri, Oct 31, 2014 at 12:01 AM, Alexandre DERUMIER <aderumier@xxxxxxxxx> wrote: >>>I'll try to add more OSD next week, if it's scale it's a very good news ! > > I just tried to add 2 more osds, > > I can now reach 2x 70000 iops on 2 client nodes (vs 2 x 50000 previously). > > and kworker cpu usage is also lower (84% vs 97%). > (don't understand why exactly) > > So, Thanks for help everybody ! > > > > > > ----- Mail original ----- > > De: "Alexandre DERUMIER" <aderumier@xxxxxxxxx> > À: "Sage Weil" <sage@xxxxxxxxxxxx> > Cc: "Christoph Hellwig" <hch@xxxxxxxxxxxxx>, "Ceph Devel" <ceph-devel@xxxxxxxxxxxxxxx> > Envoyé: Jeudi 30 Octobre 2014 09:11:11 > Objet: Re: krbd blk-mq support ? > >>>Hmm, this is probably the messenger.c worker then that is feeding messages >>>to the network. How many OSDs do you have? It should be able to scale >>>with the number of OSDs. > > Thanks Sage for your reply. > > Currently 6 OSD (ssd) on the test platform. > > But I can reach 2x 50000iops on same rbd volume with 2 clients on 2 differents host. > Do you think messenger.c worker can be the bottleneck in this case ? > > > I'll try to add more OSD next week, if it's scale it's a very good news ! > > > > > > > > ----- Mail original ----- > > De: "Sage Weil" <sage@xxxxxxxxxxxx> > À: "Alexandre DERUMIER" <aderumier@xxxxxxxxx> > Cc: "Christoph Hellwig" <hch@xxxxxxxxxxxxx>, "Ceph Devel" <ceph-devel@xxxxxxxxxxxxxxx> > Envoyé: Mercredi 29 Octobre 2014 16:00:56 > Objet: Re: krbd blk-mq support ? > > On Wed, 29 Oct 2014, Alexandre DERUMIER wrote: >> >>Oh, that's without the blk-mq patch? >> >> Yes, sorry, I don't how to use perf with a custom compiled kernel. >> (Usualy I'm using perf from debian, with linux-tools package provided with the debian kernel package) >> >> >>Either way the profile doesn't really sum up to a fully used up cpu. >> >> But I see mostly same behaviour with or without blk-mq patch, I have always 1 kworker at around 97-100%cpu (1core) for 50000iops. >> >> I had also tried to map the rbd volume with nocrc, it's going to 60000iops with same kworker at around 97-100%cpu > > Hmm, this is probably the messenger.c worker then that is feeding messages > to the network. How many OSDs do you have? It should be able to scale > with the number of OSDs. > > sage > > >> >> >> >> ----- Mail original ----- >> >> De: "Christoph Hellwig" <hch@xxxxxxxxxxxxx> >> ?: "Alexandre DERUMIER" <aderumier@xxxxxxxxx> >> Cc: "Ceph Devel" <ceph-devel@xxxxxxxxxxxxxxx> >> Envoy?: Mardi 28 Octobre 2014 19:07:25 >> Objet: Re: krbd blk-mq support ? >> >> On Mon, Oct 27, 2014 at 11:00:46AM +0100, Alexandre DERUMIER wrote: >> > >>Can you do a perf report -ag and then a perf report to see where these >> > >>cycles are spent? >> > >> > Yes, sure. >> > >> > I have attached the perf report to this mail. >> > (This is with kernel 3.14, don't have access to my 3.18 host for now) >> >> Oh, that's without the blk-mq patch? >> >> Either way the profile doesn't really sum up to a fully used up >> cpu. Sage, Alex - are there any ordring constraints in the rbd client? >> If not we could probably aim for per-cpu queues using blk-mq and a >> socket per cpu or similar. >> -- >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> the body of a message to majordomo@xxxxxxxxxxxxxxx >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Best Regards, Wheat -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html