[PATCH 00/10 v5] ext4: extent status tree (step2)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi all,

This is my fifth try to implement the second step of extent status tree.
The patch set can be divided into the following parts.

Patch 1/10
  This patch refines the extent status tree

Patch 2/10-6/10
  These patches try to track all extent status in extent status tree and
make it as a extent cache.  In extent_status structure bit field is removed
because we get some warnings from 'sparse'.  Now es_pblk and es_status are
manipulated by ext4_es_*_pblock and ext4_es_*_status directly.  Currently
when an unwritten extent is allocated, we never know it from map->m_flags
because ext4_ext_map_blocks doesn't return EXT4_MAP_UNWRITTEN flag.  A
patch fixes it and we can determine the extent status according to m_flags.
  According to Jan's feedback, we put the hole into extent cache to avoid
to access extent tree in disk as far as possible.  Here if the whole file
is a hole, this hole will not be cached in extent status tree because it
is always splitted immediately.  Meanwhile the hole will not be cached
when ext4_da_map_blocks looks up a block mapping because this hole will be
as a delayed extent later.

Patch 7/10-8/10
  This two patches try to reclaim memory from extent status tree when we
are under a high memeory pressure.

Patch 9/10-10/10
  Thses patches are picked up again from 1st version because I aware that
they could remove a bogus wait in ext4_ind_direct_IO when dioread_nolock
is enabled.  After applied them, the latency of dio read can be reduced.

I measure it using fio and the result shows as below.

config file
-----------
[global]
ioengine=psync
direct=1
bs=4k
thread
group_reporting
directory=/mnt/sda1/
filename=testfile
filesize=10g
size=10g
runtime=120
iodepth=16

[fio]
rw=randrw
numjobs=4

result
------
w/ bogus wait
  read : io=1508.1MB, bw=12876KB/s, iops=3218 , runt=120001msec
    clat (usec): min=128 , max=268738 , avg=718.62, stdev=3703.97
     lat (usec): min=128 , max=268739 , avg=718.78, stdev=3703.97
  write: io=1505.2MB, bw=12843KB/s, iops=3210 , runt=120001msec
    clat (usec): min=47 , max=991727 , avg=520.94, stdev=3451.63
     lat (usec): min=47 , max=991727 , avg=521.31, stdev=3451.63

w/o bogus wait
  read : io=1576.4MB, bw=13451KB/s, iops=3362 , runt=120001msec
    clat (usec): min=128 , max=283906 , avg=685.88, stdev=2762.64
     lat (usec): min=128 , max=283907 , avg=686.05, stdev=2762.64
  write: io=1577.9MB, bw=13458KB/s, iops=3364 , runt=120001msec
    clat (usec): min=48 , max=977942 , avg=498.97, stdev=3093.08
     lat (usec): min=48 , max=977943 , avg=499.33, stdev=3093.08

>From the result we can see that the avg. of latency could be reduced a little.

changelog:
v5 <- v4:
 - drop a patch that removes EXT4_MAP_FROM_CLUSTER flag
   (I will revise it in the patch set of get_block_t refinement)
 - fold original patch 3/9 into patch 4/9
 - manipulate es_pblk and es_status directly
   (bit field is removed because it causes some warnings from 'sparse')
 - let ext4_ext_map_blocks return EXT4_MAP_UNWRITTEN flag
 - rename ext4_es_find_extent with ext4_es_find_delayed_extent
 - add hole status and put hole into extent status tree as a cache
 - convert unwritten extents from extent status tree in ext4_ext_direct_IO
   and end_io callback
 - remove a bogus wait in ext4_ind_direct_IO when dioread_nolock is enabled

v4 <- v3:
 - register a normal shrinker to reclaim extent from extent status tree

v3 <- v2:
 - use prune_super() to reclaim extents from extent status tree
 - stashed es_status into es_pblk
 - remove single extent cache
 - rebase against 3.8-rc4

v2 <- v1:
 - drop patches that try to improve unwritten extent conversion
 - remove EXT4_MAP_FROM_CLUSTER flag
 - add tracepoint for ext4_es_lookup_extent()
 - drop a patch, which tries to fix a warning when bigalloc and delalloc
   are enabled
 - add a shrinker to reclaim memory from extent status tree
 - rebase against 3.8-rc2

v4: http://lwn.net/Articles/536037/
v3: http://lwn.net/Articles/533730/
v2: http://lwn.net/Articles/532446/
v1: http://lwn.net/Articles/531065/

As always, any comments or feedbacks are welcome.

FWIW, when I try to implement patch 3/10, I realize that get_block_t and
*_map_blocks functions need to be refactored because in ext4 we already
have six get_block_t functions
 - ext4_get_block
 - ext4_get_block_write
 - ext4_get_block_write_nolock
 - noalloc_get_block_write
 - ext4_da_get_block_prep
 - _ext4_get_block

and four *_map_blocks
 - ext4_map_blocks
 - ext4_da_map_blocks
 - ext4_ext_map_blocks
 - ext4_ind_map_blocks

So I am planning to refine them.  First I will try to split ext4_map_blocks
into two parts, e.g. ext4_map_blocks_read and ext4_map_blocks_write, and 
then try other cleanups and improvmentes.

Thanks,
						- Zheng

Zheng Liu (10):
  ext4: refine extent status tree
  ext4: add physical block and status member into extent status tree
  ext4: let ext4_ext_map_blocks return EXT4_MAP_UNWRITTEN flag
  ext4: track all extent status in extent status tree
  ext4: lookup block mapping in extent status tree
  ext4: remove single extent cache
  ext4: adjust some functions for reclaiming extents from extent status
    tree
  ext4: reclaim extents from extent status tree
  ext4: convert unwritten extents from extent status tree in end_io
  ext4: remove bogus wait for unwritten extents in ext4_ind_direct_IO

 fs/ext4/ext4.h              |  21 +-
 fs/ext4/ext4_extents.h      |   6 -
 fs/ext4/extents.c           | 211 ++++--------
 fs/ext4/extents_status.c    | 779 +++++++++++++++++++++++++++++++++++---------
 fs/ext4/extents_status.h    |  84 ++++-
 fs/ext4/file.c              |  16 +-
 fs/ext4/indirect.c          |   5 -
 fs/ext4/inode.c             | 148 +++++++--
 fs/ext4/move_extent.c       |   3 -
 fs/ext4/page-io.c           |   8 +-
 fs/ext4/super.c             |   8 +-
 include/trace/events/ext4.h | 207 ++++++++++--
 12 files changed, 1075 insertions(+), 421 deletions(-)

-- 
1.7.12.rc2.18.g61b472e

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Reiser Filesystem Development]     [Ceph FS]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite National Park]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Device Mapper]     [Linux Media]

  Powered by Linux