* Peter Zijlstra <peterz@xxxxxxxxxxxxx> [2013-07-30 11:33:21]: > On Tue, Jul 30, 2013 at 02:45:43PM +0530, Srikar Dronamraju wrote: > > > Can you please suggest workloads that I could try which might showcase > > why you hate pure process based approach? > > 2 processes, 1 sysvshm segment. I know there's multi-process MPI > libraries out there. > > Something like: perf bench numa mem -p 2 -G 4096 -0 -z --no-data_rand_walk -Z > The above dumped core; Looks like -T is a must with -G. I tried "perf bench numa mem -p 2 -T 32 -G 4096 -0 -z --no-data_rand_walk -Z" It still didn't seem to do anything on my 4 node box (almost 2 hours and nothing happened). Finally I ran "perf bench numa mem -a" (both with ht disabled and enabled) Convergence wise my patchset did really well. bw looks like a mixed bag. Though there are improvements, we see degradations. I am not sure how to quantify which was the best among the three. nx1 tests were the ones where this patchset had a -ve; but +ve for all others. Is this what you were looking for? Or was it something else? (Lower is better) testcase 3.9.0 Mels v5 this_patchset Units ------------------------------------------------------------------------------ 1x3-convergence 0.320 100.060 100.204 secs 1x4-convergence 100.139 100.162 100.155 secs 1x6-convergence 100.455 100.179 1.078 secs 2x3-convergence 100.261 100.339 9.743 secs 3x3-convergence 100.213 100.168 10.073 secs 4x4-convergence 100.307 100.201 19.686 secs 4x4-convergence-NOTHP 100.229 100.221 3.189 secs 4x6-convergence 101.441 100.632 6.204 secs 4x8-convergence 100.680 100.588 5.275 secs 8x4-convergence 100.335 100.365 34.069 secs 8x4-convergence-NOTHP 100.331 100.412 100.478 secs 3x1-convergence 1.227 1.536 0.576 secs 4x1-convergence 1.224 1.063 1.390 secs 8x1-convergence 1.713 2.437 1.704 secs 16x1-convergence 2.750 2.677 1.856 secs 32x1-convergence 1.985 1.795 1.391 secs (Higher is better) testcase 3.9.0 Mels v5 this_patchset Units ------------------------------------------------------------------------------ RAM-bw-local 3.341 3.340 3.325 GB/sec RAM-bw-local-NOTHP 3.308 3.307 3.290 GB/sec RAM-bw-remote 1.815 1.815 1.815 GB/sec RAM-bw-local-2x 6.410 6.413 6.412 GB/sec RAM-bw-remote-2x 3.020 3.041 3.027 GB/sec RAM-bw-cross 4.397 3.425 4.374 GB/sec 2x1-bw-process 3.481 3.442 3.492 GB/sec 3x1-bw-process 5.423 7.547 5.445 GB/sec 4x1-bw-process 5.108 11.009 5.118 GB/sec 8x1-bw-process 8.929 10.935 8.825 GB/sec 8x1-bw-process-NOTHP 12.754 11.442 22.889 GB/sec 16x1-bw-process 12.886 12.685 13.546 GB/sec 4x1-bw-thread 19.147 17.964 9.622 GB/sec 8x1-bw-thread 26.342 30.171 14.679 GB/sec 16x1-bw-thread 41.527 36.363 40.070 GB/sec 32x1-bw-thread 45.005 40.950 49.846 GB/sec 2x3-bw-thread 9.493 14.444 8.145 GB/sec 4x4-bw-thread 18.309 16.382 45.384 GB/sec 4x6-bw-thread 14.524 18.502 17.058 GB/sec 4x8-bw-thread 13.315 16.852 33.693 GB/sec 4x8-bw-thread-NOTHP 12.273 12.226 24.887 GB/sec 3x3-bw-thread 17.614 11.960 16.119 GB/sec 5x5-bw-thread 13.415 17.585 24.251 GB/sec 2x16-bw-thread 11.718 11.174 17.971 GB/sec 1x32-bw-thread 11.360 10.902 14.330 GB/sec numa02-bw 48.999 44.173 54.795 GB/sec numa02-bw-NOTHP 47.655 42.600 53.445 GB/sec numa01-bw-thread 36.983 39.692 45.254 GB/sec numa01-bw-thread-NOTHP 38.486 35.208 44.118 GB/sec With HT ON (Lower is better) testcase 3.9.0 Mels v5 this_patchset Units ------------------------------------------------------------------------------ 1x3-convergence 100.114 100.138 100.084 secs 1x4-convergence 0.468 100.227 100.153 secs 1x6-convergence 100.278 100.400 100.197 secs 2x3-convergence 100.186 1.833 13.132 secs 3x3-convergence 100.302 100.457 2.087 secs 4x4-convergence 100.237 100.178 2.466 secs 4x4-convergence-NOTHP 100.148 100.251 2.985 secs 4x6-convergence 100.931 3.632 9.184 secs 4x8-convergence 100.398 100.456 4.801 secs 8x4-convergence 100.649 100.458 4.179 secs 8x4-convergence-NOTHP 100.391 100.428 9.758 secs 3x1-convergence 1.472 1.501 0.727 secs 4x1-convergence 1.478 1.489 1.408 secs 8x1-convergence 2.380 2.385 2.432 secs 16x1-convergence 3.260 3.399 2.219 secs 32x1-convergence 2.622 2.067 1.951 secs (Higher is better) testcase 3.9.0 Mels v5 this_patchset Units ------------------------------------------------------------------------------ RAM-bw-local 3.333 3.342 3.345 GB/sec RAM-bw-local-NOTHP 3.305 3.306 3.307 GB/sec RAM-bw-remote 1.814 1.814 1.816 GB/sec RAM-bw-local-2x 7.896 6.400 6.538 GB/sec RAM-bw-remote-2x 2.982 3.038 3.034 GB/sec RAM-bw-cross 4.313 3.427 4.372 GB/sec 2x1-bw-process 3.473 4.708 3.784 GB/sec 3x1-bw-process 5.397 4.983 5.399 GB/sec 4x1-bw-process 5.040 8.775 5.098 GB/sec 8x1-bw-process 8.989 6.862 13.745 GB/sec 8x1-bw-process-NOTHP 8.457 19.094 8.118 GB/sec 16x1-bw-process 13.482 23.067 15.138 GB/sec 4x1-bw-thread 14.904 18.258 9.713 GB/sec 8x1-bw-thread 24.160 29.153 12.495 GB/sec 16x1-bw-thread 41.283 36.642 32.140 GB/sec 32x1-bw-thread 46.983 43.068 48.153 GB/sec 2x3-bw-thread 9.718 15.344 10.846 GB/sec 4x4-bw-thread 12.602 15.758 13.148 GB/sec 4x6-bw-thread 13.807 11.278 18.540 GB/sec 4x8-bw-thread 13.316 11.677 22.795 GB/sec 4x8-bw-thread-NOTHP 12.548 21.797 30.807 GB/sec 3x3-bw-thread 13.500 18.758 18.569 GB/sec 5x5-bw-thread 14.575 14.199 36.521 GB/sec 2x16-bw-thread 11.345 11.434 19.569 GB/sec 1x32-bw-thread 14.123 10.586 14.587 GB/sec numa02-bw 50.963 44.092 53.419 GB/sec numa02-bw-NOTHP 50.553 42.724 51.106 GB/sec numa01-bw-thread 33.724 33.050 37.801 GB/sec numa01-bw-thread-NOTHP 39.064 35.139 43.314 GB/sec -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>