Re: [PATCH updated] ext4: Handle page without buffers in ext4_*_writepage()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 2008-06-30 at 15:36 +0530, Aneesh Kumar K.V wrote:
> From: Aneesh Kumar <aneesh.kumar@xxxxxxxxxxxxxxxxxx>
> 
> It can happen that buffers are removed from the page before it gets
> marked dirty and then is passed to writepage(). In writepage()
> we just initialize the buffers and check whether they are mapped and non
> delay. If they are mapped and non delay we write the page. Otherwise we
> mark then dirty. With this change we don't do block allocation at all in
> ext4_*_write_page.
> 
> writepage() get called under many condition and with a locking order
> of journal_start -> lock_page we shouldnot try to allocate blocks in
> writepage() which get called after taking page lock. writepage can get
> called via shrink_page_list even with a journal handle which was created
> for doing inode update. For example when doing ext4_da_write_begin we
> create a journal handle with credit 1 expecting a i_disksize update for
> the inode. But ext4_da_write_begin can cause shrink_page_list via
> _grab_page_cache. So having a valid handle via ext4_journal_current_handle
> is not a guarantee that we can use the handle for block allocation in
> writepage. We should not be using credits which are taken for other updates.
> That would result in we running out of credits when we update inodes.
> 
> 
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@xxxxxxxxxxxxxxxxxx>
> ---
>  fs/ext4/inode.c |  169 ++++++++++++++++++++++++++++++++++++++++---------------
>  1 files changed, 124 insertions(+), 45 deletions(-)
> 
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index cd5c165..18af94a 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -1591,11 +1591,15 @@ static int ext4_da_get_block_write(struct inode *inode, sector_t iblock,
>  	handle_t *handle = NULL;
> 
>  	handle = ext4_journal_current_handle();
> -	BUG_ON(handle == NULL);
> -	BUG_ON(create == 0);
> -
> -	ret = ext4_get_blocks_wrap(handle, inode, iblock, max_blocks,
> +	if (!handle) {
> +		ret = ext4_get_blocks_wrap(handle, inode, iblock, max_blocks,
> +				   bh_result, 0, 0, 0);
> +		BUG_ON(!ret);
> +	} else {
> +		ret = ext4_get_blocks_wrap(handle, inode, iblock, max_blocks,
>  				   bh_result, create, 0, EXT4_DELALLOC_RSVED);
> +	}
> +
>  	if (ret > 0) {
>  		bh_result->b_size = (ret << inode->i_blkbits);
> 
> @@ -1633,15 +1637,37 @@ static int ext4_da_get_block_write(struct inode *inode, sector_t iblock,
> 
>  static int ext4_bh_unmapped_or_delay(handle_t *handle, struct buffer_head *bh)
>  {
> -	return !buffer_mapped(bh) || buffer_delay(bh);
> +	/*
> +	 * unmapped buffer is possible for holes.
> +	 * delay buffer is possible with delayed allocation
> +	 */
> +	return ((!buffer_mapped(bh) || buffer_delay(bh)) && buffer_dirty(bh));
> +}
> +

Could you clarify why we need to add buffer_dirty check here. the
comment about "unmapped buffer is possible for holes" seems trying to
clarifying that, but it doesn't help me much:)


But the problem of above change, which adding buffer_dirty(bh) check
here, will cause i_disksize updated too earliy in ext4_da_write_end(), 

Ext4_da_write_end() uses ext4_bh_unmapped_or_delay() function to check
if it extend the file size without need for allocation. But at that time
the buffer has not being dirtied yet (done in code later in
block_commit_write()), so it always return true and update i_disksize
(before block allocation). we could fix that ext4_da_write_end() to not
use this helper function, also there is another issue :

The i_disksize is updated at ext4_da_write_end() time if
writes to the end of file, and the buffers are all have
blocks allocated. But if the page had one buffer marked
as buffer_delay, and the write is at EOF and on a buffer has block
already allocated, we certainly need to extend the i_disksize.

patch below...

Signed-off-by: Mingming Cao <cmm@xxxxxxxxxx>
---
 fs/ext4/inode.c |   32 ++++++++++++++++++++++++++++++--
 1 file changed, 30 insertions(+), 2 deletions(-)

Index: linux-2.6.26-rc8/fs/ext4/inode.c
===================================================================
--- linux-2.6.26-rc8.orig/fs/ext4/inode.c	2008-07-01 15:13:00.000000000 -0700
+++ linux-2.6.26-rc8/fs/ext4/inode.c	2008-07-01 15:34:19.000000000 -0700
@@ -1892,6 +1892,31 @@ out:
 	return ret;
 }
 
+/*
+ * Check if we should update i_disksize
+ * when write to the end of file but not require block allocation
+ */
+static int ext4_da_should_update_i_disksize(struct page *page,
+					 unsigned long offset)
+{
+	struct buffer_head *head, *bh;
+	unsigned int curr_off = 0;
+
+	head = page_buffers(page);
+	bh = head;
+	do {
+		unsigned int next_off = curr_off + bh->b_size;
+
+		if ((curr_off >= offset) &&
+		   (!buffer_mapped || (buffer_delay(bh))) {
+			return 0;
+		}
+		curr_off = next_off;
+	} while ((bh = bh->b_this_page) != head);
+
+	return 1;
+}
+
 static int ext4_da_write_end(struct file *file,
 				struct address_space *mapping,
 				loff_t pos, unsigned len, unsigned copied,
@@ -1901,6 +1926,10 @@ static int ext4_da_write_end(struct file
 	int ret = 0, ret2;
 	handle_t *handle = ext4_journal_current_handle();
 	loff_t new_i_size;
+	unsigned long start, end;
+
+	start = pos & (PAGE_CACHE_SIZE - 1);
+	end = start + copied;
 
 	/*
 	 * generic_write_end() will run mark_inode_dirty() if i_size
@@ -1910,8 +1939,7 @@ static int ext4_da_write_end(struct file
 
 	new_i_size = pos + copied;
 	if (new_i_size > EXT4_I(inode)->i_disksize)
-		if (!walk_page_buffers(NULL, page_buffers(page),
-				       0, len, NULL, ext4_bh_unmapped_or_delay)){
+		if (ext4_da_should_update_i_disksize(page, end))
 			/*
 			 * Updating i_disksize when extending file without
 			 * need block allocation









--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Reiser Filesystem Development]     [Ceph FS]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite National Park]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Device Mapper]     [Linux Media]

  Powered by Linux