Ext4: journal credits caclulation cleanup and fix that for nonextent writepage From: Mingming Cao <cmm@xxxxxxxxxx> When considering how many journal credits are needed for modifying a chunk of data, we need to account for the super block, inode block, quota blocks and xattr block, indirect/index blocks, also, group bitmap and group descriptor blocks for new allocation (including data and indirect/index blocks). There are many places in ext4 do the calculation their own and often missed one or two meta blocks, and often they assume single block allocation, and did not considering the multile chunk of allocation case. This patch is trying to cleanup current journal credit code, provides some common helper function to calculate the journal credits, to be used for writepage, writepages, DIO, fallocate, migration, defrag, and for both nonextent and extent files. This patch modified the writepage/write_begin credit calculation for nonextent files, to use the new helper function. It also fixed the problem that writepage on nonextent files did not consider the case blocksize <pagesize, thus could possibelly need multiple block allocation in a single transaction. Signed-off-by: Mingming Cao <cmm@xxxxxxxxxx> --- --- fs/ext4/ext4.h | 1 fs/ext4/ext4_jbd2.h | 8 +++ fs/ext4/inode.c | 136 ++++++++++++++++++++++++++++++++++++++-------------- 3 files changed, 110 insertions(+), 35 deletions(-) Index: linux-2.6.27-rc1/fs/ext4/inode.c =================================================================== --- linux-2.6.27-rc1.orig/fs/ext4/inode.c 2008-08-11 15:11:51.000000000 -0700 +++ linux-2.6.27-rc1/fs/ext4/inode.c 2008-08-11 15:11:55.000000000 -0700 @@ -4333,56 +4333,125 @@ int ext4_getattr(struct vfsmount *mnt, s } /* - * How many blocks doth make a writepage()? + * Account for block groups bitmaps and block group descriptor blocks + * if modify datablocks and indexing blocks + * worse case, the nrblocks and indexs blocks spread + * over different block groups * - * With N blocks per page, it may be: - * N data blocks - * 2 indirect block - * 2 dindirect - * 1 tindirect - * N+5 bitmap blocks (from the above) - * N+5 group descriptor summary blocks - * 1 inode block - * 1 superblock. - * 2 * EXT4_SINGLEDATA_TRANS_BLOCKS for the quote files - * - * 3 * (N + 5) + 2 + 2 * EXT4_SINGLEDATA_TRANS_BLOCKS - * - * With ordered or writeback data it's the same, less the N data blocks. - * - * If the inode's direct blocks can hold an integral number of pages then a - * page cannot straddle two indirect blocks, and we can only touch one indirect - * and dindirect block, and the "5" above becomes "3". + * Also account for superblock, inode, quota and xattr blocks + */ +int ext4_meta_trans_blocks(struct inode* inode, int nrblocks, int idxblocks) +{ + int groups, gdpblocks; + int ret = 0; + + groups = nrblocks + idxblocks; + gdpblocks = groups; + if (groups > EXT4_SB(inode->i_sb)->s_groups_count) + groups = EXT4_SB(inode->i_sb)->s_groups_count; + if (groups > EXT4_SB(inode->i_sb)->s_gdb_count) + gdpblocks = EXT4_SB(inode->i_sb)->s_gdb_count; + + /* bitmaps and block group descriptor blocks */ + ret += groups + gdpblocks; + + ret += idxblocks; + + /* journalled mode, include buffer to modify data blocks */ + if (ext4_should_journal_data(inode)) + ret += nrblocks; + + /* Blocks for super block, inode, quota and xattr blocks */ + ret += EXT4_META_TRANS_BLOCKS(inode->i_sb); + + return ret; +} + +static int ext4_indirect_trans_blocks(struct inode *inode, int nrblocks, + int chunk) +{ + int indirects; + + /* if nrblocks are contigous */ + if (chunk) { + /* + * With N contigous data blocks, it need at most + * N/EXT4_ADDR_PER_BLOCK(inode->i_sb) indirect blocks + * 2 dindirect blocks + * 1 tindirect block + */ + indirects = nrblocks / EXT4_ADDR_PER_BLOCK(inode->i_sb); + return indirects + 3; + } + /* + * if nrblocks are not contigous, worse case, each block touch + * a indirect block, and each indirect block touch a double indirect + * block, plus a triple indirect block + */ + indirects = nrblocks * 2 + 1; + return indirects; +} +/* + * How many journal blocks are need to modify N blocks contigous data()? + * + * It need to account indirect blocks, data blocks, and + * bitmap blocks and group descriptor blocks. * * This still overestimates under most circumstances. If we were to pass the * start and end offsets in here as well we could do block_to_path() on each * block and work out the exact number of indirects which are touched. Pah. */ -int ext4_writepage_trans_blocks(struct inode *inode) +static int ext4_writeblocks_trans_credits_old(struct inode *inode, int nrblocks, + int chunk) { - int bpp = ext4_journal_blocks_per_page(inode); - int indirects = (EXT4_NDIR_BLOCKS % bpp) ? 5 : 3; + int indirects; int ret; - if (EXT4_I(inode)->i_flags & EXT4_EXTENTS_FL) - return ext4_ext_writepage_trans_blocks(inode, bpp); - - if (ext4_should_journal_data(inode)) - ret = 3 * (bpp + indirects) + 2; - else - ret = 2 * (bpp + indirects) + 2; + /* + * How many index blocks need to touch to modify nrblocks? + * The "Chunk" flag indicating whether the nrblocks is + * physically contigous on disk + * + * For Direct IO and fallocate, they calls get_block to allocate + * one single extent at a time, so they could set the "Chunk" flag + */ + indirects = ext4_indirect_trans_blocks(inode, nrblocks, chunk); -#ifdef CONFIG_QUOTA - /* We know that structure was already allocated during DQUOT_INIT so - * we will be updating only the data blocks + inodes */ - ret += 2*EXT4_QUOTA_TRANS_BLOCKS(inode->i_sb); -#endif + /* Account for block group bitmaps and block groups + * descriptors.Worse case, the nrblocks+indirects blocks spread + * over different block groups + */ + ret = ext4_meta_trans_blocks(inode, nrblocks, indirects); return ret; } /* + * Calulate the total number of credits to reserve to fit + * the modification of a single pages into a single transaction + * + * This could be called via ext4_write_begin() or later + * ext4_da_writepages() in delalyed allocation case. + * + * In both case it's possible that we could allocating multiple + * chunks of blocks. We need to consider the worse case, when + * one new block per extent. + * + * For Direct IO and fallocate, the journal credits reservation + * is based on one single extent allocation, so they could use + * EXT4_DATA_TRANS_BLOCKS to get the needed credit to log a single + * chunk of allocation needs. + */ +int ext4_writepage_trans_blocks(struct inode *inode) +{ + int bpp = ext4_journal_blocks_per_page(inode); + + if (!(EXT4_I(inode)->i_flags & EXT4_EXTENTS_FL)) + return ext4_writeblocks_trans_credits_old(inode, bpp, 0); + return ext4_ext_writepage_trans_blocks(inode, bpp); +} +/* * The caller must have previously called ext4_reserve_inode_write(). * Give this, we know that the caller already has write access to iloc->bh. */ Index: linux-2.6.27-rc1/fs/ext4/ext4.h =================================================================== --- linux-2.6.27-rc1.orig/fs/ext4/ext4.h 2008-08-11 15:11:51.000000000 -0700 +++ linux-2.6.27-rc1/fs/ext4/ext4.h 2008-08-11 15:11:55.000000000 -0700 @@ -1072,6 +1072,7 @@ extern void ext4_set_inode_flags(struct extern void ext4_get_inode_flags(struct ext4_inode_info *); extern void ext4_set_aops(struct inode *inode); extern int ext4_writepage_trans_blocks(struct inode *); +extern int ext4_meta_trans_blocks(struct inode *, int nrblocks, int idxblocks); extern int ext4_block_truncate_page(handle_t *handle, struct address_space *mapping, loff_t from); extern int ext4_page_mkwrite(struct vm_area_struct *vma, struct page *page); Index: linux-2.6.27-rc1/fs/ext4/ext4_jbd2.h =================================================================== --- linux-2.6.27-rc1.orig/fs/ext4/ext4_jbd2.h 2008-08-11 15:11:51.000000000 -0700 +++ linux-2.6.27-rc1/fs/ext4/ext4_jbd2.h 2008-08-11 15:11:55.000000000 -0700 @@ -51,6 +51,14 @@ EXT4_XATTR_TRANS_BLOCKS - 2 + \ 2*EXT4_QUOTA_TRANS_BLOCKS(sb)) +/* + * Define the number of metadata blocks we need to account to modify data. + * + * This include super block, inode block, quota blocks and xattr blocks + */ +#define EXT4_META_TRANS_BLOCKS(sb) (EXT4_XATTR_TRANS_BLOCKS + \ + 2*EXT4_QUOTA_TRANS_BLOCKS(sb)) + /* Delete operations potentially hit one directory's namespace plus an * entire inode, plus arbitrary amounts of bitmap/indirection data. Be * generous. We can grow the delete transaction later if necessary. */ -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html