On Wed, Apr 14, 2010 at 12:36:59AM +1000, Dave Chinner wrote: > On Tue, Apr 13, 2010 at 08:39:29PM +0900, KOSAKI Motohiro wrote: > > > FWIW, the biggest problem here is that I have absolutely no clue on > > > how to test what the impact on lumpy reclaim really is. Does anyone > > > have a relatively simple test that can be run to determine what the > > > impact is? > > > > So, can you please run two workloads concurrently? > > - Normal IO workload (fio, iozone, etc..) > > - echo $NUM > /proc/sys/vm/nr_hugepages > > What do I measure/observe/record that is meaningful? So, a rough as guts first pass - just run a large dd (8 times the size of memory - 8GB file vs 1GB RAM) and repeated try to allocate the entire of memory in huge pages (500) every 5 seconds. The IO rate is roughly 100MB/s, so it takes 75-85s to complete the dd. The script: $ cat t.sh #!/bin/bash echo 0 > /proc/sys/vm/nr_hugepages echo 3 > /proc/sys/vm/drop_caches dd if=/dev/zero of=/mnt/scratch/test bs=1024k count=8000 > /dev/null 2>&1 & ( for i in `seq 1 1 20`; do sleep 5 /usr/bin/time --format="wall %e" sh -c "echo 500 > /proc/sys/vm/nr_hugepages" 2>&1 grep HugePages_Total /proc/meminfo done ) | awk ' /wall/ { wall += $2; cnt += 1 } /Pages/ { pages[cnt] = $2 } END { printf "average wall time %f\nPages step: ", wall / cnt ; for (i = 1; i <= cnt; i++) { printf "%d ", pages[i]; } }' ---- And the output looks like: $ sudo ./t.sh average wall time 0.954500 Pages step: 97 101 101 121 173 173 173 173 173 173 175 194 195 195 202 220 226 419 423 426 $ Run 50 times in a loop, and the outputs averaged, the existing lumpy reclaim resulted in: dave@test-1:~$ cat current.txt | awk -f av.awk av. wall = 0.519385 secs av Pages step: 192 228 242 255 265 272 279 284 289 294 298 303 307 322 342 366 383 401 412 420 And with my patch that disables ->writepage: dave@test-1:~$ cat no-direct.txt | awk -f av.awk av. wall = 0.554163 secs av Pages step: 231 283 310 316 323 328 336 340 345 351 356 359 364 377 388 397 413 423 432 439 Basically, with my patch lumpy reclaim was *substantially* more effective with only a slight increase in average allocation latency with this test case. I need to add a marker to the output that records when the dd completes, but from monitoring the writeback rates via PCP, they were in the balllpark of 85-100MB/s for the existing code, and 95-110MB/s with my patch. Hence it improved both IO throughput and the effectiveness of lumpy reclaim. On the down side, I did have an OOM killer invocation with my patch after about 150 iterations - dd failed an order zero allocation because there were 455 huge pages allocated and there were only _320_ available pages for IO, all of which were under IO. i.e. lumpy reclaim worked so well that the machine got into order-0 page starvation. I know this is a simple test case, but it shows much better results than I think anyone (even me) is expecting... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html