Hi, This version uses a new buffer flag, holesize, as Dave Chinner suggested. The problem: If you do a fiemap operation on a very large sparse file, it can take an extremely long amount of time (we're talking days here) because function __generic_block_fiemap does a block-for-block search when it encounters a hole. The solution: Allow the underlying file system to return the hole size so that function __generic_block_fiemap can quickly skip the hole. Preamble: In cases where the fs-specific block_map() function finds a hole, it can return the hole size in b_size. This is efficient because the file system doesn't need to figure out block mapping a second time to determine the hole size. The patch uses a new buffer_holesize flag to tell when the fs-specific block_map() is passing back the hole_size: If the fs-specific block_map() doesn't set the buffer_holesize bit, function __generic_block_fiemap() assumes a hole size of 1 as before. Other file systems that want to take advantage of the new "hole size" functionality need only write their own function to determine the hole size, call it from their respective block_map() function, and set_buffer_holesize to put it into use. I've written a simple patch to GFS2 that does just that, as a follow-on. Patch description: This patch changes function __generic_block_fiemap so that if the fs-specific block_map sets the buffer_holesize flag corresponding to a hole, it takes the returned b_size to be the size of the hole, in bytes. This is much faster than trying each block individually when large holes are encountered. Regards, Bob Peterson Red Hat File Systems Signed-off-by: Bob Peterson <rpeterso@xxxxxxxxxx> --- fs/ioctl.c | 7 ++++++- include/linux/buffer_head.h | 2 ++ 2 files changed, 8 insertions(+), 1 deletion(-) diff --git a/fs/ioctl.c b/fs/ioctl.c index 8ac3fad..121ba6f 100644 --- a/fs/ioctl.c +++ b/fs/ioctl.c @@ -291,13 +291,18 @@ int __generic_block_fiemap(struct inode *inode, memset(&map_bh, 0, sizeof(struct buffer_head)); map_bh.b_size = len; + clear_buffer_holesize(&map_bh); ret = get_block(inode, start_blk, &map_bh, 0); if (ret) break; /* HOLE */ if (!buffer_mapped(&map_bh)) { - start_blk++; + if (buffer_holesize(&map_bh)) + start_blk += logical_to_blk(inode, + map_bh.b_size); + else + start_blk++; /* * We want to handle the case where there is an diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h index 324329c..39ed1f1 100644 --- a/include/linux/buffer_head.h +++ b/include/linux/buffer_head.h @@ -37,6 +37,7 @@ enum bh_state_bits { BH_Meta, /* Buffer contains metadata */ BH_Prio, /* Buffer should be submitted with REQ_PRIO */ BH_Defer_Completion, /* Defer AIO completion to workqueue */ + BH_Holesize, /* Hole encountered, hole size returned */ BH_PrivateStart,/* not a state bit, but the first bit available * for private allocation by other entities @@ -128,6 +129,7 @@ BUFFER_FNS(Boundary, boundary) BUFFER_FNS(Write_EIO, write_io_error) BUFFER_FNS(Unwritten, unwritten) BUFFER_FNS(Meta, meta) +BUFFER_FNS(Holesize, holesize) BUFFER_FNS(Prio, prio) BUFFER_FNS(Defer_Completion, defer_completion) -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html