this time with attachment:
manish
On 12/09/2014 01:54 PM, Manish Awasthi wrote:
resending:
dirty_ratio same for both the kernels.
vm.dirty_background_bytes = 0
vm.dirty_background_ratio = 10
vm.dirty_bytes = 0
vm.dirty_expire_centisecs = 3000
vm.dirty_ratio = 20
vm.dirty_writeback_centisecs = 500
I re-ran the tests with the same set of kernel without enabling
multithread support on 3.18 and measured a few things with perf.
perf-stat-<kernel>.txt: test ran for some time and measured various
parameters.
Meanwhile I'm also running complete test under perf record. I'll
share the results soon.
Manish
On 12/03/2014 11:51 AM, NeilBrown wrote:
On Wed, 26 Nov 2014 13:41:39 +0530 Manish Awasthi
<manish.awasthi@xxxxxxxxxxxxxxxxxx> wrote:
Whatever data I have on comparison is attached, I have consolidated
this
from log files to excel. See if this helps.
raid_3_18_performance.xls shows read throughput to be consistently
20% down
on 3.18 compared to 3.6.11.
Writes are a few percent better for 4G/8G files, 20% better for
16G/32G files.
unchanged above that.
Given that you have 8G of RAM, that seems like it could be some
change in
caching behaviour, and not necessarily a change in RAID behaviour.
The CPU utilization roughly follows the throughput: 40% higher when
write
throughput is 20% better.
Could you check if the value of /proc/sys/vm/dirty_ratio is the same
for both
tests. That number has changed occasionally and could affect these
tests.
The second file, 3SSDs-perf-2-Cores-3.18-rc1 has the "change" numbers
negative where I expected positive.. i.e. negative mean an increase.
Writes consistently have higher CPU utilisation.
Reads consistently have much lower CPU utilization.
I don't know what that means ... it might not mean anything.
Could you please run the tests between the two kernels *with* RAID.
i.e.
directly on an SSD. That will give us a baseline for what changes
are caused
by other parts of the kernel (filesystem, block layer, MM, etc).
Then we can
see how much change RAID5 is contributing.
The third file, 3SSDs-perf-4Core.xls seems to show significantly
reduced
throughput across the board.
CPU utilization is less (better) for writes, but worse for reads.
That is
the reverse of what the second file shows.
I might try running some tests across a set of kernel versions and
see what I
can come up with.
NeilBrown
perf stat on md125_raid5 -- kernel 3.6.11
# perf stat -p 2613 -e cycles,instructions,cache-references,cache-misses,branches,branch-misses,bus-cycles,stalled-cycles-frontend,ref-cycles,cpu-clock,task-clock,faults,context-switches,cpu-migrations,minor-faults,major-faults,alignment-faults,emulation-faults,L1-dcache-load-misses,L1-dcache-store-misses,L1-dcache-prefetch-misses,L1-icache-load-misses,LLC-loads,LLC-stores,LLC-prefetches,dTLB-load-misses,dTLB-store-misses,iTLB-loads,iTLB-load-misses,branch-loads,branch-load-misses
^C
Performance counter stats for process id '2613':
103,200,677,721 cycles # 2.848 GHz [22.72%]
69,669,813,983 instructions # 0.68 insns per cycle
# 1.07 stalled cycles per insn [27.26%]
2,668,465,769 cache-references # 73.648 M/sec [27.35%]
1,408,493,680 cache-misses # 52.783 % of all cache refs [27.17%]
13,609,211,321 branches # 375.607 M/sec [27.19%]
121,593,598 branch-misses # 0.89% of all branches [27.32%]
3,420,725,359 bus-cycles # 94.410 M/sec [18.07%]
74,362,368,252 stalled-cycles-frontend # 72.06% frontend cycles idle [18.16%]
112,553,945,650 ref-cycles # 3106.427 M/sec [22.76%]
36233.766411 cpu-clock (msec)
36232.605499 task-clock (msec) # 0.181 CPUs utilized
0 faults # 0.000 K/sec
442,885 context-switches # 0.012 M/sec
9,646 cpu-migrations # 0.266 K/sec
0 minor-faults # 0.000 K/sec
0 major-faults # 0.000 K/sec
0 alignment-faults # 0.000 K/sec
0 emulation-faults # 0.000 K/sec
3,188,865,936 L1-dcache-load-misses # 88.011 M/sec [22.96%]
1,658,831,957 L1-dcache-store-misses # 45.783 M/sec [22.89%]
338,744,029 L1-dcache-prefetch-misses # 9.349 M/sec [23.04%]
445,066,995 L1-icache-load-misses # 12.284 M/sec [22.99%]
1,578,067,225 LLC-loads # 43.554 M/sec [18.19%]
1,317,822,999 LLC-stores # 36.371 M/sec [18.23%]
798,004,610 LLC-prefetches # 22.024 M/sec [ 9.09%]
0 dTLB-load-misses # 0.000 K/sec [13.52%]
7,633,236 dTLB-store-misses # 0.211 M/sec [18.03%]
10,024,464 iTLB-loads # 0.277 M/sec [17.92%]
3,157,141 iTLB-load-misses # 31.49% of all iTLB cache hits [18.12%]
13,616,857,645 branch-loads # 375.818 M/sec [18.16%]
119,250,450 branch-load-misses # 3.291 M/sec [18.14%]
200.190181623 seconds time elapsed
perf stat on md125_raid5 -- kernel 3.18
# perf stat -p 2778 -e cycles,instructions,cache-references,cache-misses,branches,branch-misses,bus-cycles,stalled-cycles-frontend,ref-cycles,cpu-clock,task-clock,faults,context-switches,cpu-migrations,minor-faults,major-faults,alignment-faults,emulation-faults,L1-dcache-load-misses,L1-dcache-store-misses,L1-dcache-prefetch-misses,L1-icache-load-misses,LLC-loads,LLC-stores,LLC-prefetches,dTLB-load-misses,dTLB-store-misses,iTLB-loads,iTLB-load-misses,branch-loads,branch-load-misses
^C
Performance counter stats for process id '2778':
191,212,778,981 cycles # 2.942 GHz [22.99%]
160,318,628,367 instructions # 0.84 insns per cycle
# 0.77 stalled cycles per insn [27.49%]
3,800,688,695 cache-references # 58.485 M/sec [27.40%]
1,418,431,693 cache-misses # 37.320 % of all cache refs [27.27%]
33,635,552,951 branches # 517.586 M/sec [27.12%]
352,264,516 branch-misses # 1.05% of all branches [27.19%]
6,035,806,867 bus-cycles # 92.879 M/sec [18.21%]
122,980,401,285 stalled-cycles-frontend # 64.32% frontend cycles idle [18.16%]
197,829,618,312 ref-cycles # 3044.216 M/sec [22.72%]
65039.738267 cpu-clock (msec)
64985.415568 task-clock (msec) # 0.186 CPUs utilized
0 faults # 0.000 K/sec
3,437,945 context-switches # 0.053 M/sec
237 cpu-migrations # 0.004 K/sec
0 minor-faults # 0.000 K/sec
0 major-faults # 0.000 K/sec
0 alignment-faults # 0.000 K/sec
0 emulation-faults # 0.000 K/sec
5,329,711,939 L1-dcache-load-misses # 82.014 M/sec [22.83%]
2,138,400,107 L1-dcache-store-misses # 32.906 M/sec [22.52%]
667,646,968 L1-dcache-prefetch-misses # 10.274 M/sec [22.48%]
2,259,425,830 L1-icache-load-misses # 34.768 M/sec [22.45%]
2,090,596,777 LLC-loads # 32.170 M/sec [17.93%]
1,679,287,271 LLC-stores # 25.841 M/sec [18.04%]
1,120,086,147 LLC-prefetches # 17.236 M/sec [ 9.09%]
465,142,622 dTLB-load-misses # 7.158 M/sec [13.69%]
26,672,298 dTLB-store-misses # 0.410 M/sec [18.26%]
66,723,475 iTLB-loads # 1.027 M/sec [18.37%]
9,736,729 iTLB-load-misses # 14.59% of all iTLB cache hits [18.43%]
33,238,082,664 branch-loads # 511.470 M/sec [18.44%]
346,025,993 branch-load-misses # 5.325 M/sec [18.46%]
348.946853958 seconds time elapsed