Re: krbd blk-mq support ?

Alexandre DERUMIER <aderumier@xxxxxxxxx> · Fri, 31 Oct 2014 06:04:14 +0100 (CET)

>>Could you describe more about 2x70000 iops?
>>So you mean 8 OSD each backend with SSD can achieve with 14w iops?

It's a small rbd (10G), so mostly read hit the buffer cache.
But yes, it's able to deliver 140000iops with 8 osd. (I check also stats in ceph cluster to be sure).
(and I'm not cpu bound on osd nodes)
>> 2014-10-31 05:58:34.231037 mon.0 [INF] pgmap v7109: 1264 pgs: 1264 active+clean; 165 GB data, 109 GB used, 6226 GB / 6335 GB avail; 560 MB/s rd, 140 kop/s

here the ceph.conf of osd nodes

[global]
fsid = c29f4643-9577-4671-ae25-59ad14550aba
auth_cluster_required = none
auth_service_required = none
auth_client_required = none
filestore_xattr_use_omap = true

       debug lockdep = 0/0
        debug context = 0/0
        debug crush = 0/0
        debug buffer = 0/0
        debug timer = 0/0
        debug journaler = 0/0
        debug osd = 0/0
        debug optracker = 0/0
        debug objclass = 0/0
        debug filestore = 0/0
        debug journal = 0/0
        debug ms = 0/0
        debug monc = 0/0
        debug tp = 0/0
        debug auth = 0/0
        debug finisher = 0/0
        debug heartbeatmap = 0/0
        debug perfcounter = 0/0
        debug asok = 0/0
        debug throttle = 0/0

        osd_op_threads = 5
        filestore_op_threads = 4

        osd_op_num_threads_per_shard = 1
        osd_op_num_shards = 25
        filestore_fd_cache_size = 64
        filestore_fd_cache_shards = 32
        osd_enable_op_tracker = false

>>is it read or write? could you give fio options?
random read 4K

Here the fio config.

[global]
ioengine=aio
invalidate=1    
rw=randread
bs=4K
direct=1
numjobs=1
group_reporting=1
size=10G

[test1]
iodepth=64
filename=/dev/rbd/test/test

On 1 client node, I can't reach more than 50000iops with 6osd or 70000iops with 8 osd.
(I had try to increasing numjobs to have more fio process or with 2 differents rbd volume at the same time, 
 but performance is the same).

>> 2014-10-31 05:57:30.078348 mon.0 [INF] pgmap v7070: 1264 pgs: 1264 active+clean; 165 GB data, 109 GB used, 6226 GB / 6335 GB avail; 290 MB/s rd, 74572 op/s

But If I launch same fio test on another client node, I can reach same 70000iops at the same time.

>> 2014-10-31 05:58:34.231037 mon.0 [INF] pgmap v7109: 1264 pgs: 1264 active+clean; 165 GB data, 109 GB used, 6226 GB / 6335 GB avail; 560 MB/s rd, 140 kop/s

----- Mail original ----- 

De: "Haomai Wang" <haomaiwang@xxxxxxxxx> 
À: "Alexandre DERUMIER" <aderumier@xxxxxxxxx> 
Cc: "Sage Weil" <sage@xxxxxxxxxxxx>, "Christoph Hellwig" <hch@xxxxxxxxxxxxx>, "Ceph Devel" <ceph-devel@xxxxxxxxxxxxxxx> 
Envoyé: Jeudi 30 Octobre 2014 18:05:26 
Objet: Re: krbd blk-mq support ? 

Could you describe more about 2x70000 iops? 
So you mean 8 OSD each backend with SSD can achieve with 14w iops? 
is it read or write? could you give fio options? 

On Fri, Oct 31, 2014 at 12:01 AM, Alexandre DERUMIER 
<aderumier@xxxxxxxxx> wrote: 
>>>I'll try to add more OSD next week, if it's scale it's a very good news ! 
> 
> I just tried to add 2 more osds, 
> 
> I can now reach 2x 70000 iops on 2 client nodes (vs 2 x 50000 previously). 
> 
> and kworker cpu usage is also lower (84% vs 97%). 
> (don't understand why exactly) 
> 
> So, Thanks for help everybody ! 
> 
> 
> 
> 
> 
> ----- Mail original ----- 
> 
> De: "Alexandre DERUMIER" <aderumier@xxxxxxxxx> 
> À: "Sage Weil" <sage@xxxxxxxxxxxx> 
> Cc: "Christoph Hellwig" <hch@xxxxxxxxxxxxx>, "Ceph Devel" <ceph-devel@xxxxxxxxxxxxxxx> 
> Envoyé: Jeudi 30 Octobre 2014 09:11:11 
> Objet: Re: krbd blk-mq support ? 
> 
>>>Hmm, this is probably the messenger.c worker then that is feeding messages 
>>>to the network. How many OSDs do you have? It should be able to scale 
>>>with the number of OSDs. 
> 
> Thanks Sage for your reply. 
> 
> Currently 6 OSD (ssd) on the test platform. 
> 
> But I can reach 2x 50000iops on same rbd volume with 2 clients on 2 differents host. 
> Do you think messenger.c worker can be the bottleneck in this case ? 
> 
> 
> I'll try to add more OSD next week, if it's scale it's a very good news ! 
> 
> 
> 
> 
> 
> 
> 
> ----- Mail original ----- 
> 
> De: "Sage Weil" <sage@xxxxxxxxxxxx> 
> À: "Alexandre DERUMIER" <aderumier@xxxxxxxxx> 
> Cc: "Christoph Hellwig" <hch@xxxxxxxxxxxxx>, "Ceph Devel" <ceph-devel@xxxxxxxxxxxxxxx> 
> Envoyé: Mercredi 29 Octobre 2014 16:00:56 
> Objet: Re: krbd blk-mq support ? 
> 
> On Wed, 29 Oct 2014, Alexandre DERUMIER wrote: 
>> >>Oh, that's without the blk-mq patch? 
>> 
>> Yes, sorry, I don't how to use perf with a custom compiled kernel. 
>> (Usualy I'm using perf from debian, with linux-tools package provided with the debian kernel package) 
>> 
>> >>Either way the profile doesn't really sum up to a fully used up cpu. 
>> 
>> But I see mostly same behaviour with or without blk-mq patch, I have always 1 kworker at around 97-100%cpu (1core) for 50000iops. 
>> 
>> I had also tried to map the rbd volume with nocrc, it's going to 60000iops with same kworker at around 97-100%cpu 
> 
> Hmm, this is probably the messenger.c worker then that is feeding messages 
> to the network. How many OSDs do you have? It should be able to scale 
> with the number of OSDs. 
> 
> sage 
> 
> 
>> 
>> 
>> 
>> ----- Mail original ----- 
>> 
>> De: "Christoph Hellwig" <hch@xxxxxxxxxxxxx> 
>> ?: "Alexandre DERUMIER" <aderumier@xxxxxxxxx> 
>> Cc: "Ceph Devel" <ceph-devel@xxxxxxxxxxxxxxx> 
>> Envoy?: Mardi 28 Octobre 2014 19:07:25 
>> Objet: Re: krbd blk-mq support ? 
>> 
>> On Mon, Oct 27, 2014 at 11:00:46AM +0100, Alexandre DERUMIER wrote: 
>> > >>Can you do a perf report -ag and then a perf report to see where these 
>> > >>cycles are spent? 
>> > 
>> > Yes, sure. 
>> > 
>> > I have attached the perf report to this mail. 
>> > (This is with kernel 3.14, don't have access to my 3.18 host for now) 
>> 
>> Oh, that's without the blk-mq patch? 
>> 
>> Either way the profile doesn't really sum up to a fully used up 
>> cpu. Sage, Alex - are there any ordring constraints in the rbd client? 
>> If not we could probably aim for per-cpu queues using blk-mq and a 
>> socket per cpu or similar. 
>> -- 
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in 
>> the body of a message to majordomo@xxxxxxxxxxxxxxx 
>> More majordomo info at http://vger.kernel.org/majordomo-info.html 
>> 
>> 
> -- 
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in 
> the body of a message to majordomo@xxxxxxxxxxxxxxx 
> More majordomo info at http://vger.kernel.org/majordomo-info.html 
> -- 
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in 
> the body of a message to majordomo@xxxxxxxxxxxxxxx 
> More majordomo info at http://vger.kernel.org/majordomo-info.html 

-- 
Best Regards, 

Wheat 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html