Did you see this kind of regression in SRP ? or with some other target
(e.g TGT) ?
Trying to understand if it's a ULP issue or LLD...
On 6/20/2016 6:23 PM, Robert LeBlanc wrote:
Adding linux-scsi
This last week I tried to figure out where a 10-15% decrease in
performance showed up between 4.5 and 4.6 using iSER and ConnectX-3
and Connect-IB cards (10.{218,219}.*.17 are Connect-IB and 10.220.*.17
are ConnectX-3). To review, straight RDMA transfers between cards
showed line rate was being achieved, just iSER was not able to achieve
those same rates for some cards on different kernels.
4.5 vanilla default config
sdc;10.218.128.17;3800048;950012;22075
sdi;10.218.202.17;3757158;939289;22327
sdg;10.218.203.17;3774062;943515;22227
sdn;10.218.204.17;3816299;954074;21981
sdd;10.219.128.17;3821863;955465;21949
sdf;10.219.202.17;3784106;946026;22168
sdj;10.219.203.17;3827094;956773;21919
sdm;10.219.204.17;3788208;947052;22144
sde;10.220.128.17;5054596;1263649;16596
sdh;10.220.202.17;5013811;1253452;16731
sdl;10.220.203.17;5052160;1263040;16604
sdk;10.220.204.17;4990248;1247562;16810
4.6 vanilla default config
sde;10.218.128.17;3431063;857765;24449
sdf;10.218.202.17;3360685;840171;24961
sdi;10.218.203.17;3355174;838793;25002
sdm;10.218.204.17;3360955;840238;24959
sdd;10.219.128.17;3337288;834322;25136
sdh;10.219.202.17;3327492;831873;25210
sdj;10.219.203.17;3380867;845216;24812
sdk;10.219.204.17;3418340;854585;24540
sdc;10.220.128.17;4668377;1167094;17969
sdg;10.220.202.17;4716675;1179168;17785
sdl;10.220.203.17;4675663;1168915;17941
sdn;10.220.204.17;4631519;1157879;18112
I narrowed the performance degradation to this series
7861728..5e47f19, but while trying to bisect it, the changes were
erratic between each commit that I could not figure out exactly which
introduced the issue. If someone could give me some pointers on what
to do, I can keep trying to dig through this.
4.5.0_rc5_7861728d_00001
sdc;10.218.128.17;3747591;936897;22384
sdf;10.218.202.17;3750607;937651;22366
sdh;10.218.203.17;3750439;937609;22367
sdn;10.218.204.17;3771008;942752;22245
sde;10.219.128.17;3867678;966919;21689
sdg;10.219.202.17;3781889;945472;22181
sdk;10.219.203.17;3791804;947951;22123
sdl;10.219.204.17;3795406;948851;22102
sdd;10.220.128.17;5039110;1259777;16647
sdi;10.220.202.17;4992921;1248230;16801
sdj;10.220.203.17;5015610;1253902;16725
sdm;10.220.204.17;5087087;1271771;16490
4.5.0_rc5_f81bf458_00018
sdb;10.218.128.17;5023720;1255930;16698
sde;10.218.202.17;5016809;1254202;16721
sdj;10.218.203.17;5021915;1255478;16704
sdk;10.218.204.17;5021314;1255328;16706
sdc;10.219.128.17;4984318;1246079;16830
sdf;10.219.202.17;4986096;1246524;16824
sdh;10.219.203.17;5043958;1260989;16631
sdm;10.219.204.17;5032460;1258115;16669
sdd;10.220.128.17;3736740;934185;22449
sdg;10.220.202.17;3728767;932191;22497
sdi;10.220.203.17;3752117;938029;22357
sdl;10.220.204.17;3763901;940975;22287
4.5.0_rc5_07b63196_00027
sdb;10.218.128.17;3606142;901535;23262
sdg;10.218.202.17;3570988;892747;23491
sdf;10.218.203.17;3576011;894002;23458
sdk;10.218.204.17;3558113;889528;23576
sdc;10.219.128.17;3577384;894346;23449
sde;10.219.202.17;3575401;893850;23462
sdj;10.219.203.17;3567798;891949;23512
sdl;10.219.204.17;3584262;896065;23404
sdd;10.220.128.17;4430680;1107670;18933
sdh;10.220.202.17;4488286;1122071;18690
sdi;10.220.203.17;4487326;1121831;18694
sdm;10.220.204.17;4441236;1110309;18888
4.5.0_rc5_5e47f198_00036
sdb;10.218.128.17;3519597;879899;23834
sdi;10.218.202.17;3512229;878057;23884
sdh;10.218.203.17;3518563;879640;23841
sdk;10.218.204.17;3582119;895529;23418
sdd;10.219.128.17;3550883;887720;23624
sdj;10.219.202.17;3558415;889603;23574
sde;10.219.203.17;3552086;888021;23616
sdl;10.219.204.17;3579521;894880;23435
sdc;10.220.128.17;4532912;1133228;18506
sdf;10.220.202.17;4558035;1139508;18404
sdg;10.220.203.17;4601035;1150258;18232
sdm;10.220.204.17;4548150;1137037;18444
While bisecting the kernel, I also stumbled across one that worked
really well for both adapters which I haven't seen in the release
kernels.
4.5.0_rc3_1aaa57f5_00399
sdc;10.218.128.17;4627942;1156985;18126
sdf;10.218.202.17;4590963;1147740;18272
sdk;10.218.203.17;4564980;1141245;18376
sdn;10.218.204.17;4571946;1142986;18348
sdd;10.219.128.17;4591717;1147929;18269
sdi;10.219.202.17;4505644;1126411;18618
sdg;10.219.203.17;4562001;1140500;18388
sdl;10.219.204.17;4583187;1145796;18303
sde;10.220.128.17;5511568;1377892;15220
sdh;10.220.202.17;5515555;1378888;15209
sdj;10.220.203.17;5609983;1402495;14953
sdm;10.220.204.17;5509035;1377258;15227
Here the ConnectX-3 card is performing perfectly while the Connect-IB
card still has some room for improvement.
I'd like to get to the bottom of why I'm not seeing the same
performance out of the newer kernels, but I just don't understand the
code. I've tried to do what I can in narrowing down where major
changes happened in the kernel to cause these changes in hopes that it
would help someone on the list. If there is anything I can do to help
out, please let me know.
Thank you,
----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1
On Fri, Jun 10, 2016 at 3:36 PM, Robert LeBlanc <robert@xxxxxxxxxxxxx> wrote:
I bisected the kernel and it looks like the performance of the
Connect-IB card goes down and the performance of the ConnectX-3 card
goes up with this commit (but I'm not sure why this would cause this):
ab46db0a3325a064bb24e826b12995d157565efb is the first bad commit
commit ab46db0a3325a064bb24e826b12995d157565efb
Author: Jiri Olsa <jolsa@xxxxxxxxxx>
Date: Thu Dec 3 10:06:43 2015 +0100
perf stat: Use perf_evlist__enable in handle_initial_delay
No need to mimic the behaviour of perf_evlist__enable, we can use it
directly.
Signed-off-by: Jiri Olsa <jolsa@xxxxxxxxxx>
Tested-by: Arnaldo Carvalho de Melo <acme@xxxxxxxxxx>
Cc: Adrian Hunter <adrian.hunter@xxxxxxxxx>
Cc: David Ahern <dsahern@xxxxxxxxx>
Cc: Namhyung Kim <namhyung@xxxxxxxxxx>
Cc: Peter Zijlstra <a.p.zijlstra@xxxxxxxxx>
Link: http://lkml.kernel.org/r/1449133606-14429-5-git-send-email-jolsa@xxxxxxxxxx
Signed-off-by: Arnaldo Carvalho de Melo <acme@xxxxxxxxxx>
:040000 040000 67e69893bf6d47b372e08d7089d37a7b9f602fa7
b63d9b366f078eabf86f4da3d1cc53ae7434a949 M tools
4.4.0_rc2_3e27c920
sdc;10.218.128.17;5291495;1322873;15853
sde;10.218.202.17;4966024;1241506;16892
sdh;10.218.203.17;4980471;1245117;16843
sdk;10.218.204.17;4966612;1241653;16890
sdd;10.219.128.17;5060084;1265021;16578
sdf;10.219.202.17;5065278;1266319;16561
sdi;10.219.203.17;5047600;1261900;16619
sdl;10.219.204.17;5036992;1259248;16654
sdn;10.220.128.17;3775081;943770;22221
sdg;10.220.202.17;3758336;939584;22320
sdj;10.220.203.17;3792832;948208;22117
sdm;10.220.204.17;3771516;942879;22242
4.4.0_rc2_ab46db0a
sdc;10.218.128.17;3792146;948036;22121
sdf;10.218.202.17;3738405;934601;22439
sdj;10.218.203.17;3764239;941059;22285
sdl;10.218.204.17;3785302;946325;22161
sdd;10.219.128.17;3762382;940595;22296
sdg;10.219.202.17;3765760;941440;22276
sdi;10.219.203.17;3873751;968437;21655
sdm;10.219.204.17;3769483;942370;22254
sde;10.220.128.17;5022517;1255629;16702
sdh;10.220.202.17;5018911;1254727;16714
sdk;10.220.203.17;5037295;1259323;16653
sdn;10.220.204.17;5033064;1258266;16667
----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1
On Wed, Jun 8, 2016 at 9:33 AM, Robert LeBlanc <robert@xxxxxxxxxxxxx> wrote:
With 4.1.15, the C-IB card gets about 1.15 MIOPs, while the CX3 gets
about 0.99 MIOPs. But starting with the 4.4.4 kernel, the C-IB card
drops to 0.96 MIOPs and the CX3 card jumps to 1.25 MIOPs. In the 4.6.0
kernel, both cards drop, the C-IB to 0.82 MIOPs and the CX3 to 1.15
MIOPs. I confirmed this morning that the card order was swapped on the
4.6.0 kernel and it was not different ports of the C-IB performing
differently, but different cards.
Given the limitations of the PCIe 8x port for the CX3, I think 1.25
MIOPs is about the best we can do there. In summary, the performance
of the C-IB card drops after 4.1.15 and gets progressively worse as
the kernels increase. The CX3 card peaks at the 4.4.4 kernel and
degrades a bit on the 4.6.0 kernel.
Increasing the IO depth by adding jobs does not improve performance,
it actually decreases performance. Based on an average of 4 runs at
each job number from 1-80, the Goldilocks zone is 31-57 jobs where the
difference in performance is less than 1%.
Similarly, increasing block request size does not really change the
figures to reach line speed.
Here is the output of the 4.6.0 kernel with 4M bs:
sdc;10.218.128.17;3354638;819;25006
sdf;10.218.202.17;3376920;824;24841
sdm;10.218.203.17;3367431;822;24911
sdk;10.218.204.17;3378960;824;24826
sde;10.219.128.17;3366350;821;24919
sdl;10.219.202.17;3379641;825;24821
sdg;10.219.203.17;3391254;827;24736
sdn;10.219.204.17;3401706;830;24660
sdd;10.220.128.17;4597505;1122;18246
sdi;10.220.202.17;4594231;1121;18259
sdj;10.220.203.17;4667598;1139;17972
sdh;10.220.204.17;4628197;1129;18125
The CPU on the target is a kworker thread at 96%, but no single
processor over 15%. The initiator has low fio CPU utilization (<10%)
for each job and no single CPU over 22% utilized.
I have tried manually spreading the IRQ affinity over the processors
of the respective NUMA nodes and there was no noticeable change in
performance when doing so.
Loading ib_iser on the initiator shows maybe a slight increase in performance:
sdc;10.218.128.17;3396885;849221;24695
sdf;10.218.202.17;3429240;857310;24462
sdi;10.218.203.17;3454234;863558;24285
sdm;10.218.204.17;3391666;847916;24733
sde;10.219.128.17;3403914;850978;24644
sdh;10.219.202.17;3491034;872758;24029
sdk;10.219.203.17;3390569;847642;24741
sdl;10.219.204.17;3498898;874724;23975
sdd;10.220.128.17;4664743;1166185;17983
sdg;10.220.202.17;4624880;1156220;18138
sdj;10.220.203.17;4616227;1154056;18172
sdn;10.220.204.17;4619786;1154946;18158
I'd like to see the C-IB card at 1.25+ MIOPs (I know that the target
can do that performance and we were limited on the CX3 by the PCIe bus
which isn't an issue with the 16x C-IB card for a single port).
Although the loss of performance in the CX3 card is concerning, I'm
mostly focused on the C-IB card at the moment. I will probably start
bisecting 4.1.15 to 4.4.4 to see if I can identify when the
performance of the C-IB card degrades.
----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1
On Wed, Jun 8, 2016 at 7:52 AM, Max Gurtovoy <maxg@xxxxxxxxxxxx> wrote:
On 6/8/2016 1:37 AM, Robert LeBlanc wrote:
On the 4.1.15 kernel:
sdc;10.218.128.17;3971878;992969;21120
sdd;10.218.202.17;3967745;991936;21142
sdg;10.218.203.17;3938128;984532;21301
sdk;10.218.204.17;3952602;988150;21223
sdn;10.219.128.17;4615719;1153929;18174
sdf;10.219.202.17;4622331;1155582;18148
sdi;10.219.203.17;4602297;1150574;18227
sdl;10.219.204.17;4565477;1141369;18374
sde;10.220.128.17;4594986;1148746;18256
sdh;10.220.202.17;4590209;1147552;18275
sdj;10.220.203.17;4599017;1149754;18240
sdm;10.220.204.17;4610898;1152724;18193
On the 4.6.0 kernel:
sdc;10.218.128.17;3239219;809804;25897
sdf;10.218.202.17;3321300;830325;25257
sdm;10.218.203.17;3339015;834753;25123
sdk;10.218.204.17;3637573;909393;23061
sde;10.219.128.17;3325777;831444;25223
sdl;10.219.202.17;3305464;826366;25378
sdg;10.219.203.17;3304032;826008;25389
sdn;10.219.204.17;3330001;832500;25191
sdd;10.220.128.17;4624370;1156092;18140
sdi;10.220.202.17;4619277;1154819;18160
sdj;10.220.203.17;4610138;1152534;18196
sdh;10.220.204.17;4586445;1146611;18290
It seems that there is a lot of changes between the kernels. I had
these kernels already on the box and I can bisect them if you think it
would help. It is really odd that port 2 on the Connect-IB card did
better than port 1 on the 4.6.0 kernel.
----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1
so in these kernels you get better performance with the C-IB than CX3 ?
we need to find the bottleneck.
Can you increase the iodepth and/or block size to see if we can reach the
wire speed.
another try is to load ib_iser with always_register=N.
what is the cpu utilzation in both initiator/target ?
did you spread the irq affinity ?
On Tue, Jun 7, 2016 at 10:48 AM, Robert LeBlanc <robert@xxxxxxxxxxxxx>
wrote:
The target is LIO (same kernel) with a 200 GB RAM disk and I'm running
fio as follows:
fio --rw=read --bs=4K --size=2G --numjobs=40 --name=worker.matt
--group_reporting --minimal | cut -d';' -f7,8,9
All of the paths are set the same with noop and nomerges to either 1
or 2 (doesn't make a big difference).
I started looking into this when the 4.6 kernel wasn't performing as
well as we were able to get the 4.4 kernel to work. I went back to the
4.4 kernel and I could not replicate the 4+ million IOPs. So I started
breaking down the problem to smaller pieces and found this anomaly.
Since there hasn't been any suggestions up to this point, I'll check
other kernel version to see if it is specific to certain kernels. If
you need more information, please let me know.
Thanks,
----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1
On Tue, Jun 7, 2016 at 6:02 AM, Max Gurtovoy <maxg@xxxxxxxxxxxx> wrote:
On 6/7/2016 1:36 AM, Robert LeBlanc wrote:
I'm trying to understand why our Connect-IB card is not performing as
well as our ConnectX-3 card. There are 3 ports between the two cards
and 12 paths to the iSER target which is a RAM disk.
<snip>
When I run fio against each path individually, I get:
What is the scenario (bs, numjobs, iodepth) for each run ?
Which target do you use ? backing store ?
disk;target IP;bandwidth,IOPs,Execution time
sdn;10.218.128.17;5053682;1263420;16599
sde;10.218.202.17;5032158;1258039;16670
sdh;10.218.203.17;4993516;1248379;16799
sdk;10.218.204.17;5081848;1270462;16507
sdc;10.219.128.17;3750942;937735;22364
sdf;10.219.202.17;3746921;936730;22388
sdi;10.219.203.17;3873929;968482;21654
sdl;10.219.204.17;3841465;960366;21837
sdd;10.220.128.17;3760358;940089;22308
sdg;10.220.202.17;3866252;966563;21697
sdj;10.220.203.17;3757495;939373;22325
sdm;10.220.204.17;4064051;1016012;20641
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html