Re: Connect-IB not performing as well as ConnectX-3 with iSER

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I can test with SRP and report back what I find (haven't used SRP in
years so I'll need to brush up on it).
----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


On Mon, Jun 20, 2016 at 3:27 PM, Max Gurtovoy <maxg@xxxxxxxxxxxx> wrote:
> Did you see this kind of regression in SRP ? or with some other target (e.g
> TGT) ?
> Trying to understand if it's a ULP issue or LLD...
>
>
> On 6/20/2016 6:23 PM, Robert LeBlanc wrote:
>>
>> Adding linux-scsi
>>
>> This last week I tried to figure out where a 10-15% decrease in
>> performance showed up between 4.5 and 4.6 using iSER and ConnectX-3
>> and Connect-IB cards (10.{218,219}.*.17 are Connect-IB and 10.220.*.17
>> are ConnectX-3). To review, straight RDMA transfers between cards
>> showed line rate was being achieved, just iSER was not able to achieve
>> those same rates for some cards on different kernels.
>>
>> 4.5 vanilla default config
>> sdc;10.218.128.17;3800048;950012;22075
>> sdi;10.218.202.17;3757158;939289;22327
>> sdg;10.218.203.17;3774062;943515;22227
>> sdn;10.218.204.17;3816299;954074;21981
>> sdd;10.219.128.17;3821863;955465;21949
>> sdf;10.219.202.17;3784106;946026;22168
>> sdj;10.219.203.17;3827094;956773;21919
>> sdm;10.219.204.17;3788208;947052;22144
>> sde;10.220.128.17;5054596;1263649;16596
>> sdh;10.220.202.17;5013811;1253452;16731
>> sdl;10.220.203.17;5052160;1263040;16604
>> sdk;10.220.204.17;4990248;1247562;16810
>>
>> 4.6 vanilla default config
>> sde;10.218.128.17;3431063;857765;24449
>> sdf;10.218.202.17;3360685;840171;24961
>> sdi;10.218.203.17;3355174;838793;25002
>> sdm;10.218.204.17;3360955;840238;24959
>> sdd;10.219.128.17;3337288;834322;25136
>> sdh;10.219.202.17;3327492;831873;25210
>> sdj;10.219.203.17;3380867;845216;24812
>> sdk;10.219.204.17;3418340;854585;24540
>> sdc;10.220.128.17;4668377;1167094;17969
>> sdg;10.220.202.17;4716675;1179168;17785
>> sdl;10.220.203.17;4675663;1168915;17941
>> sdn;10.220.204.17;4631519;1157879;18112
>>
>> I narrowed the performance degradation to this series
>> 7861728..5e47f19, but while trying to bisect it, the changes were
>> erratic between each commit that I could not figure out exactly which
>> introduced the issue. If someone could give me some pointers on what
>> to do, I can keep trying to dig through this.
>>
>> 4.5.0_rc5_7861728d_00001
>> sdc;10.218.128.17;3747591;936897;22384
>> sdf;10.218.202.17;3750607;937651;22366
>> sdh;10.218.203.17;3750439;937609;22367
>> sdn;10.218.204.17;3771008;942752;22245
>> sde;10.219.128.17;3867678;966919;21689
>> sdg;10.219.202.17;3781889;945472;22181
>> sdk;10.219.203.17;3791804;947951;22123
>> sdl;10.219.204.17;3795406;948851;22102
>> sdd;10.220.128.17;5039110;1259777;16647
>> sdi;10.220.202.17;4992921;1248230;16801
>> sdj;10.220.203.17;5015610;1253902;16725
>> sdm;10.220.204.17;5087087;1271771;16490
>>
>> 4.5.0_rc5_f81bf458_00018
>> sdb;10.218.128.17;5023720;1255930;16698
>> sde;10.218.202.17;5016809;1254202;16721
>> sdj;10.218.203.17;5021915;1255478;16704
>> sdk;10.218.204.17;5021314;1255328;16706
>> sdc;10.219.128.17;4984318;1246079;16830
>> sdf;10.219.202.17;4986096;1246524;16824
>> sdh;10.219.203.17;5043958;1260989;16631
>> sdm;10.219.204.17;5032460;1258115;16669
>> sdd;10.220.128.17;3736740;934185;22449
>> sdg;10.220.202.17;3728767;932191;22497
>> sdi;10.220.203.17;3752117;938029;22357
>> sdl;10.220.204.17;3763901;940975;22287
>>
>> 4.5.0_rc5_07b63196_00027
>> sdb;10.218.128.17;3606142;901535;23262
>> sdg;10.218.202.17;3570988;892747;23491
>> sdf;10.218.203.17;3576011;894002;23458
>> sdk;10.218.204.17;3558113;889528;23576
>> sdc;10.219.128.17;3577384;894346;23449
>> sde;10.219.202.17;3575401;893850;23462
>> sdj;10.219.203.17;3567798;891949;23512
>> sdl;10.219.204.17;3584262;896065;23404
>> sdd;10.220.128.17;4430680;1107670;18933
>> sdh;10.220.202.17;4488286;1122071;18690
>> sdi;10.220.203.17;4487326;1121831;18694
>> sdm;10.220.204.17;4441236;1110309;18888
>>
>> 4.5.0_rc5_5e47f198_00036
>> sdb;10.218.128.17;3519597;879899;23834
>> sdi;10.218.202.17;3512229;878057;23884
>> sdh;10.218.203.17;3518563;879640;23841
>> sdk;10.218.204.17;3582119;895529;23418
>> sdd;10.219.128.17;3550883;887720;23624
>> sdj;10.219.202.17;3558415;889603;23574
>> sde;10.219.203.17;3552086;888021;23616
>> sdl;10.219.204.17;3579521;894880;23435
>> sdc;10.220.128.17;4532912;1133228;18506
>> sdf;10.220.202.17;4558035;1139508;18404
>> sdg;10.220.203.17;4601035;1150258;18232
>> sdm;10.220.204.17;4548150;1137037;18444
>>
>> While bisecting the kernel, I also stumbled across one that worked
>> really well for both adapters which I haven't seen in the release
>> kernels.
>>
>> 4.5.0_rc3_1aaa57f5_00399
>> sdc;10.218.128.17;4627942;1156985;18126
>> sdf;10.218.202.17;4590963;1147740;18272
>> sdk;10.218.203.17;4564980;1141245;18376
>> sdn;10.218.204.17;4571946;1142986;18348
>> sdd;10.219.128.17;4591717;1147929;18269
>> sdi;10.219.202.17;4505644;1126411;18618
>> sdg;10.219.203.17;4562001;1140500;18388
>> sdl;10.219.204.17;4583187;1145796;18303
>> sde;10.220.128.17;5511568;1377892;15220
>> sdh;10.220.202.17;5515555;1378888;15209
>> sdj;10.220.203.17;5609983;1402495;14953
>> sdm;10.220.204.17;5509035;1377258;15227
>>
>> Here the ConnectX-3 card is performing perfectly while the Connect-IB
>> card still has some room for improvement.
>>
>> I'd like to get to the bottom of why I'm not seeing the same
>> performance out of the newer kernels, but I just don't understand the
>> code. I've tried to do what I can in narrowing down where major
>> changes happened in the kernel to cause these changes in hopes that it
>> would help someone on the list. If there is anything I can do to help
>> out, please let me know.
>>
>> Thank you,
>> ----------------
>> Robert LeBlanc
>> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
>>
>>
>> On Fri, Jun 10, 2016 at 3:36 PM, Robert LeBlanc <robert@xxxxxxxxxxxxx>
>> wrote:
>>>
>>> I bisected the kernel and it looks like the performance of the
>>> Connect-IB card goes down and the performance of the ConnectX-3 card
>>> goes up with this commit (but I'm not sure why this would cause this):
>>>
>>> ab46db0a3325a064bb24e826b12995d157565efb is the first bad commit
>>> commit ab46db0a3325a064bb24e826b12995d157565efb
>>> Author: Jiri Olsa <jolsa@xxxxxxxxxx>
>>> Date:   Thu Dec 3 10:06:43 2015 +0100
>>>
>>>    perf stat: Use perf_evlist__enable in handle_initial_delay
>>>
>>>    No need to mimic the behaviour of perf_evlist__enable, we can use it
>>>    directly.
>>>
>>>    Signed-off-by: Jiri Olsa <jolsa@xxxxxxxxxx>
>>>    Tested-by: Arnaldo Carvalho de Melo <acme@xxxxxxxxxx>
>>>    Cc: Adrian Hunter <adrian.hunter@xxxxxxxxx>
>>>    Cc: David Ahern <dsahern@xxxxxxxxx>
>>>    Cc: Namhyung Kim <namhyung@xxxxxxxxxx>
>>>    Cc: Peter Zijlstra <a.p.zijlstra@xxxxxxxxx>
>>>    Link:
>>> http://lkml.kernel.org/r/1449133606-14429-5-git-send-email-jolsa@xxxxxxxxxx
>>>    Signed-off-by: Arnaldo Carvalho de Melo <acme@xxxxxxxxxx>
>>>
>>> :040000 040000 67e69893bf6d47b372e08d7089d37a7b9f602fa7
>>> b63d9b366f078eabf86f4da3d1cc53ae7434a949 M      tools
>>>
>>> 4.4.0_rc2_3e27c920
>>> sdc;10.218.128.17;5291495;1322873;15853
>>> sde;10.218.202.17;4966024;1241506;16892
>>> sdh;10.218.203.17;4980471;1245117;16843
>>> sdk;10.218.204.17;4966612;1241653;16890
>>> sdd;10.219.128.17;5060084;1265021;16578
>>> sdf;10.219.202.17;5065278;1266319;16561
>>> sdi;10.219.203.17;5047600;1261900;16619
>>> sdl;10.219.204.17;5036992;1259248;16654
>>> sdn;10.220.128.17;3775081;943770;22221
>>> sdg;10.220.202.17;3758336;939584;22320
>>> sdj;10.220.203.17;3792832;948208;22117
>>> sdm;10.220.204.17;3771516;942879;22242
>>>
>>> 4.4.0_rc2_ab46db0a
>>> sdc;10.218.128.17;3792146;948036;22121
>>> sdf;10.218.202.17;3738405;934601;22439
>>> sdj;10.218.203.17;3764239;941059;22285
>>> sdl;10.218.204.17;3785302;946325;22161
>>> sdd;10.219.128.17;3762382;940595;22296
>>> sdg;10.219.202.17;3765760;941440;22276
>>> sdi;10.219.203.17;3873751;968437;21655
>>> sdm;10.219.204.17;3769483;942370;22254
>>> sde;10.220.128.17;5022517;1255629;16702
>>> sdh;10.220.202.17;5018911;1254727;16714
>>> sdk;10.220.203.17;5037295;1259323;16653
>>> sdn;10.220.204.17;5033064;1258266;16667
>>>
>>> ----------------
>>> Robert LeBlanc
>>> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
>>>
>>>
>>> On Wed, Jun 8, 2016 at 9:33 AM, Robert LeBlanc <robert@xxxxxxxxxxxxx>
>>> wrote:
>>>>
>>>> With 4.1.15, the C-IB card gets about 1.15 MIOPs, while the CX3 gets
>>>> about 0.99 MIOPs. But starting with the 4.4.4 kernel, the C-IB card
>>>> drops to 0.96 MIOPs and the CX3 card jumps to 1.25 MIOPs. In the 4.6.0
>>>> kernel, both cards drop, the C-IB to 0.82 MIOPs and the CX3 to 1.15
>>>> MIOPs. I confirmed this morning that the card order was swapped on the
>>>> 4.6.0 kernel and it was not different ports of the C-IB performing
>>>> differently, but different cards.
>>>>
>>>> Given the limitations of the PCIe 8x port for the CX3, I think 1.25
>>>> MIOPs is about the best we can do there. In summary, the performance
>>>> of the C-IB card drops after 4.1.15 and gets progressively worse as
>>>> the kernels increase. The CX3 card peaks at the 4.4.4 kernel and
>>>> degrades a bit on the 4.6.0 kernel.
>>>>
>>>> Increasing the IO depth by adding jobs does not improve performance,
>>>> it actually decreases performance. Based on an average of 4 runs at
>>>> each job number from 1-80, the Goldilocks zone is 31-57 jobs where the
>>>> difference in performance is less than 1%.
>>>>
>>>> Similarly, increasing block request size does not really change the
>>>> figures to reach line speed.
>>>>
>>>> Here is the output of the 4.6.0 kernel with 4M bs:
>>>> sdc;10.218.128.17;3354638;819;25006
>>>> sdf;10.218.202.17;3376920;824;24841
>>>> sdm;10.218.203.17;3367431;822;24911
>>>> sdk;10.218.204.17;3378960;824;24826
>>>> sde;10.219.128.17;3366350;821;24919
>>>> sdl;10.219.202.17;3379641;825;24821
>>>> sdg;10.219.203.17;3391254;827;24736
>>>> sdn;10.219.204.17;3401706;830;24660
>>>> sdd;10.220.128.17;4597505;1122;18246
>>>> sdi;10.220.202.17;4594231;1121;18259
>>>> sdj;10.220.203.17;4667598;1139;17972
>>>> sdh;10.220.204.17;4628197;1129;18125
>>>>
>>>> The CPU on the target is a kworker thread at 96%, but no single
>>>> processor over 15%. The initiator has low fio CPU utilization (<10%)
>>>> for each job and no single CPU over 22% utilized.
>>>>
>>>> I have tried manually spreading the IRQ affinity over the processors
>>>> of the respective NUMA nodes and there was no noticeable change in
>>>> performance when doing so.
>>>>
>>>> Loading ib_iser on the initiator shows maybe a slight increase in
>>>> performance:
>>>>
>>>> sdc;10.218.128.17;3396885;849221;24695
>>>> sdf;10.218.202.17;3429240;857310;24462
>>>> sdi;10.218.203.17;3454234;863558;24285
>>>> sdm;10.218.204.17;3391666;847916;24733
>>>> sde;10.219.128.17;3403914;850978;24644
>>>> sdh;10.219.202.17;3491034;872758;24029
>>>> sdk;10.219.203.17;3390569;847642;24741
>>>> sdl;10.219.204.17;3498898;874724;23975
>>>> sdd;10.220.128.17;4664743;1166185;17983
>>>> sdg;10.220.202.17;4624880;1156220;18138
>>>> sdj;10.220.203.17;4616227;1154056;18172
>>>> sdn;10.220.204.17;4619786;1154946;18158
>>>>
>>>> I'd like to see the C-IB card at 1.25+ MIOPs (I know that the target
>>>> can do that performance and we were limited on the CX3 by the PCIe bus
>>>> which isn't an issue with the 16x C-IB card for a single port).
>>>> Although the loss of performance in the CX3 card is concerning, I'm
>>>> mostly focused on the C-IB card at the moment. I will probably start
>>>> bisecting 4.1.15 to 4.4.4 to see if I can identify when the
>>>> performance of the C-IB card degrades.
>>>> ----------------
>>>> Robert LeBlanc
>>>> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
>>>>
>>>>
>>>> On Wed, Jun 8, 2016 at 7:52 AM, Max Gurtovoy <maxg@xxxxxxxxxxxx> wrote:
>>>>>
>>>>>
>>>>>
>>>>> On 6/8/2016 1:37 AM, Robert LeBlanc wrote:
>>>>>>
>>>>>>
>>>>>> On the 4.1.15 kernel:
>>>>>> sdc;10.218.128.17;3971878;992969;21120
>>>>>> sdd;10.218.202.17;3967745;991936;21142
>>>>>> sdg;10.218.203.17;3938128;984532;21301
>>>>>> sdk;10.218.204.17;3952602;988150;21223
>>>>>> sdn;10.219.128.17;4615719;1153929;18174
>>>>>> sdf;10.219.202.17;4622331;1155582;18148
>>>>>> sdi;10.219.203.17;4602297;1150574;18227
>>>>>> sdl;10.219.204.17;4565477;1141369;18374
>>>>>> sde;10.220.128.17;4594986;1148746;18256
>>>>>> sdh;10.220.202.17;4590209;1147552;18275
>>>>>> sdj;10.220.203.17;4599017;1149754;18240
>>>>>> sdm;10.220.204.17;4610898;1152724;18193
>>>>>>
>>>>>> On the 4.6.0 kernel:
>>>>>> sdc;10.218.128.17;3239219;809804;25897
>>>>>> sdf;10.218.202.17;3321300;830325;25257
>>>>>> sdm;10.218.203.17;3339015;834753;25123
>>>>>> sdk;10.218.204.17;3637573;909393;23061
>>>>>> sde;10.219.128.17;3325777;831444;25223
>>>>>> sdl;10.219.202.17;3305464;826366;25378
>>>>>> sdg;10.219.203.17;3304032;826008;25389
>>>>>> sdn;10.219.204.17;3330001;832500;25191
>>>>>> sdd;10.220.128.17;4624370;1156092;18140
>>>>>> sdi;10.220.202.17;4619277;1154819;18160
>>>>>> sdj;10.220.203.17;4610138;1152534;18196
>>>>>> sdh;10.220.204.17;4586445;1146611;18290
>>>>>>
>>>>>> It seems that there is a lot of changes between the kernels. I had
>>>>>> these kernels already on the box and I can bisect them if you think it
>>>>>> would help. It is really odd that port 2 on the Connect-IB card did
>>>>>> better than port 1 on the 4.6.0 kernel.
>>>>>> ----------------
>>>>>> Robert LeBlanc
>>>>>> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
>>>>>
>>>>>
>>>>>
>>>>> so in these kernels you get better performance with the C-IB than CX3 ?
>>>>> we need to find the bottleneck.
>>>>> Can you increase the iodepth and/or block size to see if we can reach
>>>>> the
>>>>> wire speed.
>>>>> another try is to load ib_iser with always_register=N.
>>>>>
>>>>> what is the cpu utilzation in both initiator/target ?
>>>>> did you spread the irq affinity ?
>>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Jun 7, 2016 at 10:48 AM, Robert LeBlanc <robert@xxxxxxxxxxxxx>
>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>> The target is LIO (same kernel) with a 200 GB RAM disk and I'm
>>>>>>> running
>>>>>>> fio as follows:
>>>>>>>
>>>>>>> fio --rw=read --bs=4K --size=2G --numjobs=40 --name=worker.matt
>>>>>>> --group_reporting --minimal |  cut -d';' -f7,8,9
>>>>>>>
>>>>>>> All of the paths are set the same with noop and nomerges to either 1
>>>>>>> or 2 (doesn't make a big difference).
>>>>>>>
>>>>>>> I started looking into this when the 4.6 kernel wasn't performing as
>>>>>>> well as we were able to get the 4.4 kernel to work. I went back to
>>>>>>> the
>>>>>>> 4.4 kernel and I could not replicate the 4+ million IOPs. So I
>>>>>>> started
>>>>>>> breaking down the problem to smaller pieces and found this anomaly.
>>>>>>> Since there hasn't been any suggestions up to this point, I'll check
>>>>>>> other kernel version to see if it is specific to certain kernels. If
>>>>>>> you need more information, please let me know.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> ----------------
>>>>>>> Robert LeBlanc
>>>>>>> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Jun 7, 2016 at 6:02 AM, Max Gurtovoy <maxg@xxxxxxxxxxxx>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 6/7/2016 1:36 AM, Robert LeBlanc wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I'm trying to understand why our Connect-IB card is not performing
>>>>>>>>> as
>>>>>>>>> well as our ConnectX-3 card. There are 3 ports between the two
>>>>>>>>> cards
>>>>>>>>> and 12 paths to the iSER target which is a RAM disk.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> <snip>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> When I run fio against each path individually, I get:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> What is the scenario (bs, numjobs, iodepth) for each run ?
>>>>>>>> Which target do you use ? backing store ?
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> disk;target IP;bandwidth,IOPs,Execution time
>>>>>>>>> sdn;10.218.128.17;5053682;1263420;16599
>>>>>>>>> sde;10.218.202.17;5032158;1258039;16670
>>>>>>>>> sdh;10.218.203.17;4993516;1248379;16799
>>>>>>>>> sdk;10.218.204.17;5081848;1270462;16507
>>>>>>>>> sdc;10.219.128.17;3750942;937735;22364
>>>>>>>>> sdf;10.219.202.17;3746921;936730;22388
>>>>>>>>> sdi;10.219.203.17;3873929;968482;21654
>>>>>>>>> sdl;10.219.204.17;3841465;960366;21837
>>>>>>>>> sdd;10.220.128.17;3760358;940089;22308
>>>>>>>>> sdg;10.220.202.17;3866252;966563;21697
>>>>>>>>> sdj;10.220.203.17;3757495;939373;22325
>>>>>>>>> sdm;10.220.204.17;4064051;1016012;20641
>>>>>>>>>
>>>>>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]
  Powered by Linux