[restore CC list] > > I'm trying to understand where the performance gain comes from. > > > > I noticed that in all cases, before/after patchset, nr_vmscan_write are all zero. > > > > nr_vmscan_immediate_reclaim is significantly reduced though: > > That's a good thing, it means we burn less CPU time on skipping > through dirty pages on the LRU. > > Until a certain priority level, the dirty pages encountered on the LRU > list are marked PageReclaim and put back on the list, this is the > nr_vmscan_immediate_reclaim number. And only below that priority, we > actually ask the FS to write them, which is nr_vmscan_write. Yes, it is. > I suspect this is where the performance improvement comes from: we > find clean pages for reclaim much faster. That explains how it could reduce CPU overheads. However the dd's are throttled anyway, so I still don't understand how the speedup of dd page allocations improve the _IO_ performance. > > $ ./compare.rb -g 1000M -e nr_vmscan_immediate_reclaim thresh*/*-ioless-full-nfs-wq5-next-20111014+ thresh*/*-ioless-full-per-zone-dirty-next-20111014+ > > 3.1.0-rc9-ioless-full-nfs-wq5-next-20111014+ 3.1.0-rc9-ioless-full-per-zone-dirty-next-20111014+ > > ------------------------ ------------------------ > > 560289.00 -98.5% 8145.00 thresh=1000M/btrfs-100dd-4k-8p-4096M-1000M:10-X > > 576882.00 -98.4% 9511.00 thresh=1000M/btrfs-10dd-4k-8p-4096M-1000M:10-X > > 651258.00 -98.8% 7963.00 thresh=1000M/btrfs-1dd-4k-8p-4096M-1000M:10-X > > 1963294.00 -85.4% 286815.00 thresh=1000M/ext3-100dd-4k-8p-4096M-1000M:10-X > > 2108028.00 -10.6% 1885114.00 thresh=1000M/ext3-10dd-4k-8p-4096M-1000M:10-X > > 2499456.00 -99.9% 2061.00 thresh=1000M/ext3-1dd-4k-8p-4096M-1000M:10-X > > 2534868.00 -78.5% 545815.00 thresh=1000M/ext4-100dd-4k-8p-4096M-1000M:10-X > > 2921668.00 -76.8% 677177.00 thresh=1000M/ext4-10dd-4k-8p-4096M-1000M:10-X > > 2841049.00 -100.0% 779.00 thresh=1000M/ext4-1dd-4k-8p-4096M-1000M:10-X > > 2481823.00 -86.3% 339342.00 thresh=1000M/xfs-100dd-4k-8p-4096M-1000M:10-X > > 2508629.00 -87.4% 316614.00 thresh=1000M/xfs-10dd-4k-8p-4096M-1000M:10-X > > 2656628.00 -100.0% 678.00 thresh=1000M/xfs-1dd-4k-8p-4096M-1000M:10-X > > 24303872.00 -83.2% 4080014.00 TOTAL nr_vmscan_immediate_reclaim > > > > If you'd like to compare any other vmstat items before/after patch, > > let me know and I'll run the compare script to find them out. > > I will come back to you on this, so tired right now. But I find your > scripts interesting ;-) Are those released and available for download > somewhere? I suspect every kernel hacker has their own collection of > scripts to process data like this, maybe we should pull them all > together and put them into a git tree! Thank you for the interest :-) I used to upload my writeback test scripts to kernel.org. However its file service is not restored yet. So I attach the compare script here. It's a bit hacky for now, which I hope can be improved over time to be useful to other projects as well. Thanks, Fengguang
Attachment:
compare.rb
Description: application/ruby