Re: Delayed allocation and page_lock vs transaction start ordering

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 2008-04-15 at 11:08 -0700, Mingming Cao wrote:
> On Tue, 2008-04-15 at 18:14 +0200, Jan Kara wrote:
> >   Hi,
> > 
> >   I've ported my patch inversing locking ordering of page_lock and
> > transaction start to ext4 (on top of ext4 patch queue). Everything except
> > delayed allocation is converted (the patch is below for interested
> > readers). The question is how to proceed with delayed allocation. Its
> > current implementation in VFS is designed to work well with the old
> > ordering (page lock first, then start a transaction). We could bend it to
> > work with the new locking ordering but I really see no point since ext4 is
> > the only user. 
> 
> I think the plan is port the changes to ext2/3/JFS and support delayed
> allocation on those filesystems. 
> 
> > Also XFS has AFAIK ordering first start transaction, then
> > lock pages so if we should ever merge delayed alloc implementations the new
> > ordering would make it easier.
> >   So what do people think here? Do you agree with reimplementing current
> > mpage_da_... functions?
> 
> It worth a try, but I could not see how to bend delayed allocation to
> work the new ordering:( With delayed allocation Ext4 gets into
> writepage() directly with page locked, but we need to start transaction
> to do block allocation...:(

Looked again it seems possible to reservse the order with delayed
allocation. with ext3_da_writepgaes() we could start the journal before
calling mpage_da_writepages()(which will lock the pages), instead of
start the journal inside ext4_da_get_block_write(). So that we could get
the locking order right. Just need to taking care of the estimated
credits right.

How about this? (untested, just throw out for comment)

---
 fs/ext4/inode.c |   53 ++++++++++++++++++++++++++++++++++++++++-------------
 1 file changed, 40 insertions(+), 13 deletions(-)

Index: linux-2.6.25-rc9/fs/ext4/inode.c
===================================================================
--- linux-2.6.25-rc9.orig/fs/ext4/inode.c	2008-04-15 15:40:33.000000000 -0700
+++ linux-2.6.25-rc9/fs/ext4/inode.c	2008-04-15 16:12:27.000000000 -0700
@@ -1437,18 +1437,12 @@ static int ext4_da_get_block_prep(struct
 static int ext4_da_get_block_write(struct inode *inode, sector_t iblock,
 				   struct buffer_head *bh_result, int create)
 {
-	int ret, needed_blocks = ext4_writepage_trans_blocks(inode);
+	int ret;
 	unsigned max_blocks = bh_result->b_size >> inode->i_blkbits;
 	loff_t disksize = EXT4_I(inode)->i_disksize;
 	handle_t *handle = NULL;
 
-	if (create) {
-		handle = ext4_journal_start(inode, needed_blocks);
-		if (IS_ERR(handle)) {
-			ret = PTR_ERR(handle);
-			goto out;
-		}
-	}
+	handle = ext4_journal_current_handle();
 
 	ret = ext4_get_blocks_wrap(handle, inode, iblock, max_blocks,
 				   bh_result, create, 0);
@@ -1483,17 +1477,50 @@ static int ext4_da_get_block_write(struc
 		ret = 0;
 	}
 
-out:
-	if (handle && !IS_ERR(handle))
-		ext4_journal_stop(handle);
-
 	return ret;
 }
 
+/*
+ * For now just follow the DIO way to estimate the max credits
+ * needed to write out EXT4_MAX_BUF_BLOCKS pages.
+ * todo: need to calculate the max credits need for
+ * extent based files, currently the DIO credits is based on
+ * indirect-blocks mapping way.
+ *
+ * Probably should have a generic way to calculate credits
+ * for DIO, writepages, and truncate
+ */
+#define EXT4_MAX_BUF_BLOCKS	DIO_MAX_BLOCKS
+#define EXT4_MAX_BUF_CREDITS	DIO_CREDITS
+
 static int ext4_da_writepages(struct address_space *mapping,
 				struct writeback_control *wbc)
 {
-	return mpage_da_writepages(mapping, wbc, ext4_da_get_block_write);
+	handle_t *handle = NULL;
+	int needed_blocks;
+	int ret;
+
+	/*
+	 * Estimate the worse case needed credits to write out
+	 * EXT4_MAX_BUF_BLOCKS pages
+	 */
+	needed_blocks = ext4_writepages_trans_blocks(inode);
+
+	/* start the transaction with credits*/
+	handle = ext4_journal_start(inode, needed_blocks);
+	if (IS_ERR(handle)) {
+		ret = PTR_ERR(handle);
+		return ret;
+	}
+
+	/* set the max pages could be write-out at a time */
+	wbc->range_end = (wbc->range_start >> PAGE_CACHE_SHIFT
+			+ EXT4_MAX_BUF_BLOCKS) << PAGE_CACHE_SHIFT;
+
+	ret = mpage_da_writepages(mapping, wbc, ext4_da_get_block_write);
+	ext4_journal_stop(handle);
+
+	return ret;
 }
 
 static int ext4_da_write_begin(struct file *file, struct address_space *mapping,


--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Reiser Filesystem Development]     [Ceph FS]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite National Park]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Device Mapper]     [Linux Media]

  Powered by Linux