I'm doing some benchmarking on SCSI layout, and I ran into a case where bonnie++ seemingly stopped making forward progress. bl_pg_init_write() wants to figure out how big the layout should be and uses page_cache_next_hole() pretty aggressively if the inode size isn't equal to the mapping. Problem is that page_cache_next_hole() is fairly stupid about just walking through the page_cache radix tree by index. The end result is that for fairly large files (>4G) my machine spends all its time in __radix_tree_lookup(), and I might as well just use regular NFS. Here's some bash I use to reproduce that problem (note the truncate is 4G + 1): [root@gfs-a24c-02 local_spc4]# cat <(dd if=/dev/zero bs=1M count=4096) - > bar & [1] 2000 4096+0 records in 4096+0 records out 4294967296 bytes (4.3 GB) copied, 10.7775 s, 399 MB/s [1]+ Stopped cat <(dd if=/dev/zero bs=1M count=4096) - > bar [root@gfs-a24c-02 local_spc4]# sync [root@gfs-a24c-02 local_spc4]# truncate -s 4294967297 bar [root@gfs-a24c-02 local_spc4]# dd if=/dev/zero of=bar bs=1M count=4096 conv=notrunc 4096+0 records in 4096+0 records out 4294967296 bytes (4.3 GB) copied, 82.0328 s, 52.4 MB/s This performance problem gets far worse on larger address_maps.. A couple of ways to fix spring to mind: make page_cache_next_hole() less stupid, or instead of trying to figure out what wb_size should be in pg_init, provide a way for pg_init to look up a matching lseg beforehand. Or maybe create a lsize parameter? I'm going to continue to flail around trying to determine the best way to fix this unless sound advice is offered. I don't know enough about the page cache to make improvements to page_cache_next_hole(), nor have I any good estimate on the acceptability of the other two approaches. Thanks for any input. Ben -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html