On Wed, Apr 27, 2011 at 11:44 PM, Allison Henderson <achender@xxxxxxxxxxxxxxxxxx> wrote: > On 4/25/2011 10:44 AM, Amir Goldstein wrote: >> >> On Mon, Apr 25, 2011 at 12:08 PM, Yongqiang Yang<xiaoqiangnk@xxxxxxxxx> >> wrote: >>> >>> On Mon, Apr 25, 2011 at 3:51 PM, Amir Goldstein<amir73il@xxxxxxxxx> >>> wrote: >>>> >>>> Hi Allison, >>>> >>>> Sorry for the late response. >>>> I find it hard to digest yet another set of flags, >>>> especially, since there is already an impressive set of >>>> flags for allocation hints, which is what USE_ROOTBLKS flag really is. >>>> >>>> So I think it would be much better to pass the flag in >>>> ext4_allocation_request >>>> and add the 'ar' argument to functions that don't have it, rather than >>>> adding >>>> a 'flags' argument. >>> >>> It depends. I had a look at Allison's patch and found that functions >>> influenced by the patch can be divided into two groups: >>> 1. one of which is extent related and led by ext4_ext_insert_extent() >>> 2. while other one of which is very low level such as >>> ext4_claim_free_blocks() which is used by ext4_da_reserve_space() in a >>> relatively hight level. >>> >>> So I think maybe it's a good idea for extent related functions, but >>> not for low level functions such as ext4_claim_free_blocks. >>> >>> Actually ext4_ext_insert_extent() seldom allocates blocks, because a >>> block can contain a lot of extents. Thus adding 'ar' parameter to >>> ext4_ext_insert_extent() induces much unnecessary 'ar''s initializing. >>> >> >> Yeah, it's not clear what's the best way to handle this. >> From a system designer point of view, I would suggest to add a capability >> flag >> for allocating from reserved space, but it is probably not going to be >> an easy fix to push. >> >>> >>>> >>>> If you do create a patch to pass 'ar' down to has_free_blocks() >>>> I will also be able to use it to pass the HINT_COWING flag. >>> >>> HINT_COWING is being passed via 'ar'. >>> >> >> indeed, which is why if has_free_blocks gets 'ar' we will have that >> flag, which we do not need anyway, since we have the IS_COWING(handle) >> flag. >> >>> Yongqiang. >>>> >>>> Now here is another advise: >>>> In ext4_mb_new_blocks() after ext4_claim_free_blocks(), there is a call >>>> to >>>> dquot_alloc_block(). >>>> You need to call dquot_alloc_block_nofail() when allocating for punch >>>> hole, >>>> otherwise punch hole can fail on quota limits. >>>> We already have a patch for doing that with HINT_COWING flag. >>>> >>>> I think maybe it is best if our groups (punch hole and snapshots) have >>>> a mutual 'next' >>>> repository we can work with to prepare for the 2.6.40 merge window. >>>> It would be even better, if Ted also collaborated his big alloc patches. >>>> >>>> What do you think guys? >>>> >>>> Amir. >>>> >>>> On Tue, Apr 19, 2011 at 10:41 AM, Allison Henderson >>>> <achender@xxxxxxxxxxxxxxxxxx> wrote: >>>>> >>>>> This patch adds a flag to ext4_has_free_blocks which >>>>> enables the use of reserved blocks. This will allow >>>>> a punch hole to proceed even if the disk is full. >>>>> Punching a hole may require additional blocks to first >>>>> split the extents. The blocks will be reclaimed after >>>>> the punch hole completes. >>>>> >>>>> Because ext4_has_free_blocks is a low level function, >>>>> the flag needs to be passed down through several >>>>> functions listed below: >>>>> >>>>> ext4_ext_insert_extent >>>>> ext4_ext_create_new_leaf >>>>> ext4_ext_grow_indepth >>>>> ext4_ext_split >>>>> ext4_ext_new_meta_block >>>>> ext4_mb_new_blocks >>>>> ext4_claim_free_blocks >>>>> ext4_has_free_blocks >>>>> >>>>> Signed-off-by: Allison Henderson<achender@xxxxxxxxxx> >>>>> --- >>>>> :100644 100644 97b970e... 794c4d2... M fs/ext4/balloc.c >>>>> :100644 100644 4daaf2b... 6c1f415... M fs/ext4/ext4.h >>>>> :100644 100644 dd2cb50... 0b186d9... M fs/ext4/extents.c >>>>> :100644 100644 1a86282... ec890fd... M fs/ext4/inode.c >>>>> :100644 100644 a5837a8... db8b120... M fs/ext4/mballoc.c >>>>> :100644 100644 b545ca1... 2d9b12c... M fs/ext4/xattr.c >>>>> fs/ext4/balloc.c | 17 ++++++++++------- >>>>> fs/ext4/ext4.h | 16 +++++++++++++--- >>>>> fs/ext4/extents.c | 27 ++++++++++++++++----------- >>>>> fs/ext4/inode.c | 6 +++--- >>>>> fs/ext4/mballoc.c | 5 +++-- >>>>> fs/ext4/xattr.c | 2 +- >>>>> 6 files changed, 46 insertions(+), 27 deletions(-) >>>>> >>>>> diff --git a/fs/ext4/balloc.c b/fs/ext4/balloc.c >>>>> index 97b970e..794c4d2 100644 >>>>> --- a/fs/ext4/balloc.c >>>>> +++ b/fs/ext4/balloc.c >>>>> @@ -493,7 +493,8 @@ error_return: >>>>> * Check if filesystem has nblocks free& available for allocation. >>>>> * On success return 1, return 0 on failure. >>>>> */ >>>>> -static int ext4_has_free_blocks(struct ext4_sb_info *sbi, s64 nblocks) >>>>> +static int ext4_has_free_blocks(struct ext4_sb_info *sbi, >>>>> + s64 nblocks, int flags) >>>>> { >>>>> s64 free_blocks, dirty_blocks, root_blocks; >>>>> struct percpu_counter *fbc =&sbi->s_freeblocks_counter; >>>>> @@ -522,7 +523,9 @@ static int ext4_has_free_blocks(struct ext4_sb_info >>>>> *sbi, s64 nblocks) >>>>> /* Hm, nope. Are (enough) root reserved blocks available? */ >>>>> if (sbi->s_resuid == current_fsuid() || >>>>> ((sbi->s_resgid != 0)&& in_group_p(sbi->s_resgid)) || >>>>> - capable(CAP_SYS_RESOURCE)) { >>>>> + capable(CAP_SYS_RESOURCE) || >>>>> + (flags& EXT4_HAS_FREE_BLKS_USE_ROOTBLKS)) { >>>>> + >>>>> if (free_blocks>= (nblocks + dirty_blocks)) >>>>> return 1; >>>>> } >>>>> @@ -531,9 +534,9 @@ static int ext4_has_free_blocks(struct ext4_sb_info >>>>> *sbi, s64 nblocks) >>>>> } >>>>> >>>>> int ext4_claim_free_blocks(struct ext4_sb_info *sbi, >>>>> - s64 nblocks) >>>>> + s64 nblocks, int flags) >>>>> { >>>>> - if (ext4_has_free_blocks(sbi, nblocks)) { >>>>> + if (ext4_has_free_blocks(sbi, nblocks, flags)) { >>>>> percpu_counter_add(&sbi->s_dirtyblocks_counter, >>>>> nblocks); >>>>> return 0; >>>>> } else >>>>> @@ -554,7 +557,7 @@ int ext4_claim_free_blocks(struct ext4_sb_info >>>>> *sbi, >>>>> */ >>>>> int ext4_should_retry_alloc(struct super_block *sb, int *retries) >>>>> { >>>>> - if (!ext4_has_free_blocks(EXT4_SB(sb), 1) || >>>>> + if (!ext4_has_free_blocks(EXT4_SB(sb), 1, 0) || >>>>> (*retries)++> 3 || >>>>> !EXT4_SB(sb)->s_journal) >>>>> return 0; >>>>> @@ -577,7 +580,7 @@ int ext4_should_retry_alloc(struct super_block *sb, >>>>> int *retries) >>>>> * error stores in errp pointer >>>>> */ >>>>> ext4_fsblk_t ext4_new_meta_blocks(handle_t *handle, struct inode >>>>> *inode, >>>>> - ext4_fsblk_t goal, unsigned long *count, int *errp) >>>>> + ext4_fsblk_t goal, unsigned long *count, int *errp, int >>>>> flags) >>>>> { >>>>> struct ext4_allocation_request ar; >>>>> ext4_fsblk_t ret; >>>>> @@ -588,7 +591,7 @@ ext4_fsblk_t ext4_new_meta_blocks(handle_t *handle, >>>>> struct inode *inode, >>>>> ar.goal = goal; >>>>> ar.len = count ? *count : 1; >>>>> >>>>> - ret = ext4_mb_new_blocks(handle,&ar, errp); >>>>> + ret = ext4_mb_new_blocks(handle,&ar, errp, flags); >>>>> if (count) >>>>> *count = ar.len; >>>>> /* >>>>> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h >>>>> index 4daaf2b..6c1f415 100644 >>>>> --- a/fs/ext4/ext4.h >>>>> +++ b/fs/ext4/ext4.h >>>>> @@ -512,6 +512,8 @@ struct ext4_new_group_data { >>>>> /* Convert extent to initialized after IO complete */ >>>>> #define EXT4_GET_BLOCKS_IO_CONVERT_EXT >>>>> (EXT4_GET_BLOCKS_CONVERT|\ >>>>> >>>>> EXT4_GET_BLOCKS_CREATE_UNINIT_EXT) >>>>> + /* Punch out blocks of an extent */ >>>>> +#define EXT4_GET_BLOCKS_PUNCH_OUT_EXT 0x0020 >>>>> >>>>> /* >>>>> * Flags used by ext4_free_blocks >>>>> @@ -521,6 +523,11 @@ struct ext4_new_group_data { >>>>> #define EXT4_FREE_BLOCKS_VALIDATED 0x0004 >>>>> >>>>> /* >>>>> + * Flags used by ext4_has_free_blocks >>>>> + */ >>>>> +#define EXT4_HAS_FREE_BLKS_USE_ROOTBLKS 0x0001 >>>>> + >>>>> +/* >>>>> * ioctl commands >>>>> */ >>>>> #define EXT4_IOC_GETFLAGS FS_IOC_GETFLAGS >>>>> @@ -1638,8 +1645,10 @@ extern int ext4_bg_has_super(struct super_block >>>>> *sb, ext4_group_t group); >>>>> extern unsigned long ext4_bg_num_gdb(struct super_block *sb, >>>>> ext4_group_t group); >>>>> extern ext4_fsblk_t ext4_new_meta_blocks(handle_t *handle, struct >>>>> inode *inode, >>>>> - ext4_fsblk_t goal, unsigned long *count, int >>>>> *errp); >>>>> -extern int ext4_claim_free_blocks(struct ext4_sb_info *sbi, s64 >>>>> nblocks); >>>>> + ext4_fsblk_t goal, unsigned long *count, >>>>> + int *errp, int flags); >>>>> +extern int ext4_claim_free_blocks(struct ext4_sb_info *sbi, >>>>> + s64 nblocks, int flags); >>>>> extern void ext4_add_groupblocks(handle_t *handle, struct super_block >>>>> *sb, >>>>> ext4_fsblk_t block, unsigned long >>>>> count); >>>>> extern ext4_fsblk_t ext4_count_free_blocks(struct super_block *); >>>>> @@ -1696,7 +1705,8 @@ extern long ext4_mb_max_to_scan; >>>>> extern int ext4_mb_init(struct super_block *, int); >>>>> extern int ext4_mb_release(struct super_block *); >>>>> extern ext4_fsblk_t ext4_mb_new_blocks(handle_t *, >>>>> - struct ext4_allocation_request *, int >>>>> *); >>>>> + struct ext4_allocation_request *, >>>>> + int *, int flags); >>>>> extern int ext4_mb_reserve_blocks(struct super_block *, int); >>>>> extern void ext4_discard_preallocations(struct inode *); >>>>> extern int __init ext4_init_mballoc(void); >>>>> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c >>>>> index dd2cb50..0b186d9 100644 >>>>> --- a/fs/ext4/extents.c >>>>> +++ b/fs/ext4/extents.c >>>>> @@ -192,12 +192,12 @@ static ext4_fsblk_t ext4_ext_find_goal(struct >>>>> inode *inode, >>>>> static ext4_fsblk_t >>>>> ext4_ext_new_meta_block(handle_t *handle, struct inode *inode, >>>>> struct ext4_ext_path *path, >>>>> - struct ext4_extent *ex, int *err) >>>>> + struct ext4_extent *ex, int *err, int flags) >>>>> { >>>>> ext4_fsblk_t goal, newblock; >>>>> >>>>> goal = ext4_ext_find_goal(inode, path, >>>>> le32_to_cpu(ex->ee_block)); >>>>> - newblock = ext4_new_meta_blocks(handle, inode, goal, NULL, >>>>> err); >>>>> + newblock = ext4_new_meta_blocks(handle, inode, goal, NULL, err, >>>>> flags); >>>>> return newblock; >>>>> } >>>>> >>>>> @@ -793,7 +793,7 @@ static int ext4_ext_insert_index(handle_t *handle, >>>>> struct inode *inode, >>>>> */ >>>>> static int ext4_ext_split(handle_t *handle, struct inode *inode, >>>>> struct ext4_ext_path *path, >>>>> - struct ext4_extent *newext, int at) >>>>> + struct ext4_extent *newext, int at, int >>>>> flags) >>>>> { >>>>> struct buffer_head *bh = NULL; >>>>> int depth = ext_depth(inode); >>>>> @@ -847,7 +847,7 @@ static int ext4_ext_split(handle_t *handle, struct >>>>> inode *inode, >>>>> ext_debug("allocate %d blocks for indexes/leaf\n", depth - at); >>>>> for (a = 0; a< depth - at; a++) { >>>>> newblock = ext4_ext_new_meta_block(handle, inode, path, >>>>> - newext,&err); >>>>> + newext,&err, flags); >>>>> if (newblock == 0) >>>>> goto cleanup; >>>>> ablocks[a] = newblock; >>>>> @@ -1057,7 +1057,7 @@ cleanup: >>>>> */ >>>>> static int ext4_ext_grow_indepth(handle_t *handle, struct inode >>>>> *inode, >>>>> struct ext4_ext_path *path, >>>>> - struct ext4_extent *newext) >>>>> + struct ext4_extent *newext, int >>>>> flags) >>>>> { >>>>> struct ext4_ext_path *curp = path; >>>>> struct ext4_extent_header *neh; >>>>> @@ -1065,7 +1065,8 @@ static int ext4_ext_grow_indepth(handle_t >>>>> *handle, struct inode *inode, >>>>> ext4_fsblk_t newblock; >>>>> int err = 0; >>>>> >>>>> - newblock = ext4_ext_new_meta_block(handle, inode, path, >>>>> newext,&err); >>>>> + newblock = ext4_ext_new_meta_block(handle, inode, path, >>>>> + newext,&err, flags); >>>>> if (newblock == 0) >>>>> return err; >>>>> >>>>> @@ -1141,7 +1142,7 @@ out: >>>>> */ >>>>> static int ext4_ext_create_new_leaf(handle_t *handle, struct inode >>>>> *inode, >>>>> struct ext4_ext_path *path, >>>>> - struct ext4_extent *newext) >>>>> + struct ext4_extent *newext, int >>>>> flags) >>>>> { >>>>> struct ext4_ext_path *curp; >>>>> int depth, i, err = 0; >>>>> @@ -1161,7 +1162,7 @@ repeat: >>>>> if (EXT_HAS_FREE_INDEX(curp)) { >>>>> /* if we found index with free entry, then use that >>>>> * entry: create all needed subtree and add new leaf */ >>>>> - err = ext4_ext_split(handle, inode, path, newext, i); >>>>> + err = ext4_ext_split(handle, inode, path, newext, i, >>>>> flags); >>>>> if (err) >>>>> goto out; >>>>> >>>>> @@ -1174,7 +1175,7 @@ repeat: >>>>> err = PTR_ERR(path); >>>>> } else { >>>>> /* tree is full, time to grow in depth */ >>>>> - err = ext4_ext_grow_indepth(handle, inode, path, >>>>> newext); >>>>> + err = ext4_ext_grow_indepth(handle, inode, path, >>>>> newext, flags); >>>>> if (err) >>>>> goto out; >>>>> >>>>> @@ -1668,6 +1669,7 @@ int ext4_ext_insert_extent(handle_t *handle, >>>>> struct inode *inode, >>>>> int depth, len, err; >>>>> ext4_lblk_t next; >>>>> unsigned uninitialized = 0; >>>>> + int free_blks_flags = 0; >>>>> >>>>> if (unlikely(ext4_ext_get_actual_len(newext) == 0)) { >>>>> EXT4_ERROR_INODE(inode, "ext4_ext_get_actual_len(newext) >>>>> == 0"); >>>>> @@ -1742,7 +1744,10 @@ repeat: >>>>> * There is no free space in the found leaf. >>>>> * We're gonna add a new leaf in the tree. >>>>> */ >>>>> - err = ext4_ext_create_new_leaf(handle, inode, path, newext); >>>>> + if (flag& EXT4_GET_BLOCKS_PUNCH_OUT_EXT) >>>>> + free_blks_flags = EXT4_HAS_FREE_BLKS_USE_ROOTBLKS; >>>>> + err = ext4_ext_create_new_leaf(handle, inode, path, >>>>> + newext, free_blks_flags); >>>>> if (err) >>>>> goto cleanup; >>>>> depth = ext_depth(inode); >>>>> @@ -3446,7 +3451,7 @@ int ext4_ext_map_blocks(handle_t *handle, struct >>>>> inode *inode, >>>>> else >>>>> /* disable in-core preallocation for non-regular files >>>>> */ >>>>> ar.flags = 0; >>>>> - newblock = ext4_mb_new_blocks(handle,&ar,&err); >>>>> + newblock = ext4_mb_new_blocks(handle,&ar,&err, 0); >>>>> if (!newblock) >>>>> goto out2; >>>>> ext_debug("allocate new block: goal %llu, found %llu/%u\n", >>>>> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c >>>>> index 1a86282..ec890fd 100644 >>>>> --- a/fs/ext4/inode.c >>>>> +++ b/fs/ext4/inode.c >>>>> @@ -640,7 +640,7 @@ static int ext4_alloc_blocks(handle_t *handle, >>>>> struct inode *inode, >>>>> count = target; >>>>> /* allocating blocks for indirect blocks and direct >>>>> blocks */ >>>>> current_block = ext4_new_meta_blocks(handle, inode, >>>>> - goal,&count, >>>>> err); >>>>> + goal,&count, >>>>> err, 0); >>>>> if (*err) >>>>> goto failed_out; >>>>> >>>>> @@ -686,7 +686,7 @@ static int ext4_alloc_blocks(handle_t *handle, >>>>> struct inode *inode, >>>>> /* enable in-core preallocation only for regular files >>>>> */ >>>>> ar.flags = EXT4_MB_HINT_DATA; >>>>> >>>>> - current_block = ext4_mb_new_blocks(handle,&ar, err); >>>>> + current_block = ext4_mb_new_blocks(handle,&ar, err, 0); >>>>> if (unlikely(current_block + ar.len> EXT4_MAX_BLOCK_FILE_PHYS)) >>>>> { >>>>> EXT4_ERROR_INODE(inode, >>>>> "current_block %llu + ar.len %d> %d!", >>>>> @@ -1930,7 +1930,7 @@ repeat: >>>>> * We do still charge estimated metadata to the sb though; >>>>> * we cannot afford to run out of free blocks. >>>>> */ >>>>> - if (ext4_claim_free_blocks(sbi, md_needed + 1)) { >>>>> + if (ext4_claim_free_blocks(sbi, md_needed + 1, 0)) { >>>>> dquot_release_reservation_block(inode, 1); >>>>> if (ext4_should_retry_alloc(inode->i_sb,&retries)) { >>>>> yield(); >>>>> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c >>>>> index a5837a8..db8b120 100644 >>>>> --- a/fs/ext4/mballoc.c >>>>> +++ b/fs/ext4/mballoc.c >>>>> @@ -4276,7 +4276,8 @@ static int ext4_mb_discard_preallocations(struct >>>>> super_block *sb, int needed) >>>>> * to usual allocation >>>>> */ >>>>> ext4_fsblk_t ext4_mb_new_blocks(handle_t *handle, >>>>> - struct ext4_allocation_request *ar, int >>>>> *errp) >>>>> + struct ext4_allocation_request *ar, int >>>>> *errp, >>>>> + int flags) >>>>> { >>>>> int freed; >>>>> struct ext4_allocation_context *ac = NULL; >>>>> @@ -4303,7 +4304,7 @@ ext4_fsblk_t ext4_mb_new_blocks(handle_t *handle, >>>>> * there is enough free blocks to do block allocation >>>>> * and verify allocation doesn't exceed the quota >>>>> limits. >>>>> */ >>>>> - while (ar->len&& ext4_claim_free_blocks(sbi, ar->len)) >>>>> { >>>>> + while (ar->len&& ext4_claim_free_blocks(sbi, ar->len, >>>>> flags)) { >>>>> /* let others to free the space */ >>>>> yield(); >>>>> ar->len = ar->len>> 1; >>>>> diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c >>>>> index b545ca1..2d9b12c 100644 >>>>> --- a/fs/ext4/xattr.c >>>>> +++ b/fs/ext4/xattr.c >>>>> @@ -821,7 +821,7 @@ inserted: >>>>> goal = goal& EXT4_MAX_BLOCK_FILE_PHYS; >>>>> >>>>> block = ext4_new_meta_blocks(handle, inode, >>>>> - goal, NULL,&error); >>>>> + goal, NULL,&error, >>>>> 0); >>>>> if (error) >>>>> goto cleanup; >>>>> >>>>> -- >>>>> 1.7.1 >>>>> >>>>> -- >>>>> To unsubscribe from this list: send the line "unsubscribe linux-ext4" >>>>> in >>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx >>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>> >>>> >>> >>> >>> >>> -- >>> Best Wishes >>> Yongqiang Yang >>> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in >> the body of a message to majordomo@xxxxxxxxxxxxxxx >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > Hi All, > > I did some looking around with the idea of using an allocation_request > instead of a flags parameter, but I noticed that ext4_new_meta_blocks is > already setting up an allocation_request to pass to ext4_mb_new_blocks. > Would it make sense then that we add an "ar_flags" parameter instead of a > "flag" or another "ar" parameter? That way, ext4_new_meta_blocks can just > add the flags to the ar that it already has, and we wouldn't have to change > the ext4_mb_new_blocks parameters. And then USE_ROOTBLKS can be added on to > the EXT4_MB_HINT scheme instead of starting a new scheme. That would avoid > the extra ar initializing. What does everybody think? Would this work for > the HINT_COWING flag? I think this would be perfect. ext4_new_meta_blocks() is not much more than a helper to setup the ar struct, so passing 'flags' (in that context I see no reason to call it ar_flags) to it makes sense. The important thing is that USE_ROOTBLKS is added to the EXT4_MB_HINT scheme and passed all the way down to has_free_blocks with the 'ar' struct. This discussion makes me wonder (CC'ing Jan and Eric for that), wasn't there ever an intention to allow the use of reserved space for allocation of any metadata blocks? The use case is delayed allocation combined with async mmap writes. Do we always account for enough metadata blocks when doing delayed allocation? Both for extent and indirect mapped files? Amir. -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html