Hi all, This is my fifth try to implement the second step of extent status tree. The patch set can be divided into the following parts. Patch 1/10 This patch refines the extent status tree Patch 2/10-6/10 These patches try to track all extent status in extent status tree and make it as a extent cache. In extent_status structure bit field is removed because we get some warnings from 'sparse'. Now es_pblk and es_status are manipulated by ext4_es_*_pblock and ext4_es_*_status directly. Currently when an unwritten extent is allocated, we never know it from map->m_flags because ext4_ext_map_blocks doesn't return EXT4_MAP_UNWRITTEN flag. A patch fixes it and we can determine the extent status according to m_flags. According to Jan's feedback, we put the hole into extent cache to avoid to access extent tree in disk as far as possible. Here if the whole file is a hole, this hole will not be cached in extent status tree because it is always splitted immediately. Meanwhile the hole will not be cached when ext4_da_map_blocks looks up a block mapping because this hole will be as a delayed extent later. Patch 7/10-8/10 This two patches try to reclaim memory from extent status tree when we are under a high memeory pressure. Patch 9/10-10/10 Thses patches are picked up again from 1st version because I aware that they could remove a bogus wait in ext4_ind_direct_IO when dioread_nolock is enabled. After applied them, the latency of dio read can be reduced. I measure it using fio and the result shows as below. config file ----------- [global] ioengine=psync direct=1 bs=4k thread group_reporting directory=/mnt/sda1/ filename=testfile filesize=10g size=10g runtime=120 iodepth=16 [fio] rw=randrw numjobs=4 result ------ w/ bogus wait read : io=1508.1MB, bw=12876KB/s, iops=3218 , runt=120001msec clat (usec): min=128 , max=268738 , avg=718.62, stdev=3703.97 lat (usec): min=128 , max=268739 , avg=718.78, stdev=3703.97 write: io=1505.2MB, bw=12843KB/s, iops=3210 , runt=120001msec clat (usec): min=47 , max=991727 , avg=520.94, stdev=3451.63 lat (usec): min=47 , max=991727 , avg=521.31, stdev=3451.63 w/o bogus wait read : io=1576.4MB, bw=13451KB/s, iops=3362 , runt=120001msec clat (usec): min=128 , max=283906 , avg=685.88, stdev=2762.64 lat (usec): min=128 , max=283907 , avg=686.05, stdev=2762.64 write: io=1577.9MB, bw=13458KB/s, iops=3364 , runt=120001msec clat (usec): min=48 , max=977942 , avg=498.97, stdev=3093.08 lat (usec): min=48 , max=977943 , avg=499.33, stdev=3093.08 >From the result we can see that the avg. of latency could be reduced a little. changelog: v5 <- v4: - drop a patch that removes EXT4_MAP_FROM_CLUSTER flag (I will revise it in the patch set of get_block_t refinement) - fold original patch 3/9 into patch 4/9 - manipulate es_pblk and es_status directly (bit field is removed because it causes some warnings from 'sparse') - let ext4_ext_map_blocks return EXT4_MAP_UNWRITTEN flag - rename ext4_es_find_extent with ext4_es_find_delayed_extent - add hole status and put hole into extent status tree as a cache - convert unwritten extents from extent status tree in ext4_ext_direct_IO and end_io callback - remove a bogus wait in ext4_ind_direct_IO when dioread_nolock is enabled v4 <- v3: - register a normal shrinker to reclaim extent from extent status tree v3 <- v2: - use prune_super() to reclaim extents from extent status tree - stashed es_status into es_pblk - remove single extent cache - rebase against 3.8-rc4 v2 <- v1: - drop patches that try to improve unwritten extent conversion - remove EXT4_MAP_FROM_CLUSTER flag - add tracepoint for ext4_es_lookup_extent() - drop a patch, which tries to fix a warning when bigalloc and delalloc are enabled - add a shrinker to reclaim memory from extent status tree - rebase against 3.8-rc2 v4: http://lwn.net/Articles/536037/ v3: http://lwn.net/Articles/533730/ v2: http://lwn.net/Articles/532446/ v1: http://lwn.net/Articles/531065/ As always, any comments or feedbacks are welcome. FWIW, when I try to implement patch 3/10, I realize that get_block_t and *_map_blocks functions need to be refactored because in ext4 we already have six get_block_t functions - ext4_get_block - ext4_get_block_write - ext4_get_block_write_nolock - noalloc_get_block_write - ext4_da_get_block_prep - _ext4_get_block and four *_map_blocks - ext4_map_blocks - ext4_da_map_blocks - ext4_ext_map_blocks - ext4_ind_map_blocks So I am planning to refine them. First I will try to split ext4_map_blocks into two parts, e.g. ext4_map_blocks_read and ext4_map_blocks_write, and then try other cleanups and improvmentes. Thanks, - Zheng Zheng Liu (10): ext4: refine extent status tree ext4: add physical block and status member into extent status tree ext4: let ext4_ext_map_blocks return EXT4_MAP_UNWRITTEN flag ext4: track all extent status in extent status tree ext4: lookup block mapping in extent status tree ext4: remove single extent cache ext4: adjust some functions for reclaiming extents from extent status tree ext4: reclaim extents from extent status tree ext4: convert unwritten extents from extent status tree in end_io ext4: remove bogus wait for unwritten extents in ext4_ind_direct_IO fs/ext4/ext4.h | 21 +- fs/ext4/ext4_extents.h | 6 - fs/ext4/extents.c | 211 ++++-------- fs/ext4/extents_status.c | 779 +++++++++++++++++++++++++++++++++++--------- fs/ext4/extents_status.h | 84 ++++- fs/ext4/file.c | 16 +- fs/ext4/indirect.c | 5 - fs/ext4/inode.c | 148 +++++++-- fs/ext4/move_extent.c | 3 - fs/ext4/page-io.c | 8 +- fs/ext4/super.c | 8 +- include/trace/events/ext4.h | 207 ++++++++++-- 12 files changed, 1075 insertions(+), 421 deletions(-) -- 1.7.12.rc2.18.g61b472e -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html