On Mon, 2008-06-30 at 15:36 +0530, Aneesh Kumar K.V wrote: > From: Aneesh Kumar <aneesh.kumar@xxxxxxxxxxxxxxxxxx> > > It can happen that buffers are removed from the page before it gets > marked dirty and then is passed to writepage(). In writepage() > we just initialize the buffers and check whether they are mapped and non > delay. If they are mapped and non delay we write the page. Otherwise we > mark then dirty. With this change we don't do block allocation at all in > ext4_*_write_page. > > writepage() get called under many condition and with a locking order > of journal_start -> lock_page we shouldnot try to allocate blocks in > writepage() which get called after taking page lock. writepage can get > called via shrink_page_list even with a journal handle which was created > for doing inode update. For example when doing ext4_da_write_begin we > create a journal handle with credit 1 expecting a i_disksize update for > the inode. But ext4_da_write_begin can cause shrink_page_list via > _grab_page_cache. So having a valid handle via ext4_journal_current_handle > is not a guarantee that we can use the handle for block allocation in > writepage. We should not be using credits which are taken for other updates. > That would result in we running out of credits when we update inodes. > > > Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@xxxxxxxxxxxxxxxxxx> > --- > fs/ext4/inode.c | 169 ++++++++++++++++++++++++++++++++++++++++--------------- > 1 files changed, 124 insertions(+), 45 deletions(-) > > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c > index cd5c165..18af94a 100644 > --- a/fs/ext4/inode.c > +++ b/fs/ext4/inode.c > @@ -1591,11 +1591,15 @@ static int ext4_da_get_block_write(struct inode *inode, sector_t iblock, > handle_t *handle = NULL; > > handle = ext4_journal_current_handle(); > - BUG_ON(handle == NULL); > - BUG_ON(create == 0); > - > - ret = ext4_get_blocks_wrap(handle, inode, iblock, max_blocks, > + if (!handle) { > + ret = ext4_get_blocks_wrap(handle, inode, iblock, max_blocks, > + bh_result, 0, 0, 0); > + BUG_ON(!ret); > + } else { > + ret = ext4_get_blocks_wrap(handle, inode, iblock, max_blocks, > bh_result, create, 0, EXT4_DELALLOC_RSVED); > + } > + > if (ret > 0) { > bh_result->b_size = (ret << inode->i_blkbits); > > @@ -1633,15 +1637,37 @@ static int ext4_da_get_block_write(struct inode *inode, sector_t iblock, > > static int ext4_bh_unmapped_or_delay(handle_t *handle, struct buffer_head *bh) > { > - return !buffer_mapped(bh) || buffer_delay(bh); > + /* > + * unmapped buffer is possible for holes. > + * delay buffer is possible with delayed allocation > + */ > + return ((!buffer_mapped(bh) || buffer_delay(bh)) && buffer_dirty(bh)); > +} > + Could you clarify why we need to add buffer_dirty check here. the comment about "unmapped buffer is possible for holes" seems trying to clarifying that, but it doesn't help me much:) But the problem of above change, which adding buffer_dirty(bh) check here, will cause i_disksize updated too earliy in ext4_da_write_end(), Ext4_da_write_end() uses ext4_bh_unmapped_or_delay() function to check if it extend the file size without need for allocation. But at that time the buffer has not being dirtied yet (done in code later in block_commit_write()), so it always return true and update i_disksize (before block allocation). we could fix that ext4_da_write_end() to not use this helper function, also there is another issue : The i_disksize is updated at ext4_da_write_end() time if writes to the end of file, and the buffers are all have blocks allocated. But if the page had one buffer marked as buffer_delay, and the write is at EOF and on a buffer has block already allocated, we certainly need to extend the i_disksize. patch below... Signed-off-by: Mingming Cao <cmm@xxxxxxxxxx> --- fs/ext4/inode.c | 32 ++++++++++++++++++++++++++++++-- 1 file changed, 30 insertions(+), 2 deletions(-) Index: linux-2.6.26-rc8/fs/ext4/inode.c =================================================================== --- linux-2.6.26-rc8.orig/fs/ext4/inode.c 2008-07-01 15:13:00.000000000 -0700 +++ linux-2.6.26-rc8/fs/ext4/inode.c 2008-07-01 15:34:19.000000000 -0700 @@ -1892,6 +1892,31 @@ out: return ret; } +/* + * Check if we should update i_disksize + * when write to the end of file but not require block allocation + */ +static int ext4_da_should_update_i_disksize(struct page *page, + unsigned long offset) +{ + struct buffer_head *head, *bh; + unsigned int curr_off = 0; + + head = page_buffers(page); + bh = head; + do { + unsigned int next_off = curr_off + bh->b_size; + + if ((curr_off >= offset) && + (!buffer_mapped || (buffer_delay(bh))) { + return 0; + } + curr_off = next_off; + } while ((bh = bh->b_this_page) != head); + + return 1; +} + static int ext4_da_write_end(struct file *file, struct address_space *mapping, loff_t pos, unsigned len, unsigned copied, @@ -1901,6 +1926,10 @@ static int ext4_da_write_end(struct file int ret = 0, ret2; handle_t *handle = ext4_journal_current_handle(); loff_t new_i_size; + unsigned long start, end; + + start = pos & (PAGE_CACHE_SIZE - 1); + end = start + copied; /* * generic_write_end() will run mark_inode_dirty() if i_size @@ -1910,8 +1939,7 @@ static int ext4_da_write_end(struct file new_i_size = pos + copied; if (new_i_size > EXT4_I(inode)->i_disksize) - if (!walk_page_buffers(NULL, page_buffers(page), - 0, len, NULL, ext4_bh_unmapped_or_delay)){ + if (ext4_da_should_update_i_disksize(page, end)) /* * Updating i_disksize when extending file without * need block allocation -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html