On Fri, 2006-12-15 at 18:05 +0530, Amit K. Arora wrote: > This is the first patch in the set of two. > > It implements the ioctl which will be used for persistent preallocation. It is a respun of the previous patch which was posted earlier, and includes following changes: > * Takes care of review comments by Mingming > * The declaration of extent related macros are now moved to ext4_fs_extent.h (from ext4_fs.h) > * Updated the logic to calculate block and max_blocks in ext4/ioctl.c, which is used to call get_blocks. > > It does _not_ take care of implementing persistent preallocation for non-extent based files. It is because of the following reasons: > * It is being considered as a rare case > * Users can/should convert their file(s) to extent format to use this feature > * Moreover, posix_fallocate() can be used for this purpose, if the user does not want to convert the file(s) to the extent based format. > > > Signed-off-by: Amit Arora (aarora@xxxxxxxxxx) > Hi Amit, looks good to me, a few comments :) ..... > Index: linux-2.6.19.prealloc/fs/ext4/ioctl.c > =================================================================== > --- linux-2.6.19.prealloc.orig/fs/ext4/ioctl.c 2006-12-15 16:44:35.000000000 +0530 > +++ linux-2.6.19.prealloc/fs/ext4/ioctl.c 2006-12-15 17:47:00.000000000 +0530 > @@ -248,6 +248,65 @@ > return err; > } > > + case EXT4_IOC_PREALLOCATE: { > + struct ext4_falloc_input input; > + handle_t *handle; > + ext4_fsblk_t block, max_blocks; > + int ret, ret2, nblocks = 0, retries = 0; > + struct buffer_head map_bh; > + unsigned int blkbits = inode->i_blkbits; > + > + if (IS_RDONLY(inode)) > + return -EROFS; > + > + if (copy_from_user(&input, > + (struct ext4_falloc_input __user *) arg, sizeof(input))) > + return -EFAULT; > + > + if (input.len == 0) > + return -EINVAL; > + > + if (!(EXT4_I(inode)->i_flags & EXT4_EXTENTS_FL)) > + return -ENOTTY; > + > + block = input.offset >> blkbits; > + max_blocks = (EXT4_BLOCK_ALIGN(input.len + input.offset, > + blkbits) >> blkbits) - block; > + handle=ext4_journal_start(inode, > + EXT4_DATA_TRANS_BLOCKS(inode->i_sb)+max_blocks); > + if (IS_ERR(handle)) > + return PTR_ERR(handle); > +retry: > + ret = 0; > + while(ret>=0 && ret<max_blocks) > + { > + block = block + ret; > + max_blocks = max_blocks - ret; > + ret = ext4_ext_get_blocks(handle, inode, block, > + max_blocks, &map_bh, > + EXT4_CREATE_UNINITIALIZED_EXT, 0); > + if(ret > 0 && test_bit(BH_New, &map_bh.b_state)) > + nblocks = nblocks + ret; > + } ext4_ext_get_blocks() returns 0 when it is mapping (non allocating) a hole. In our case, we are doing allocating, so here it is not possible to returns a 0 from ext4_ext_get_blocks(). I think we should quit the loop and BUGON if ret == 0 here. > + if (ret == -ENOSPC && ext4_should_retry_alloc(inode->i_sb, > + &retries)) > + goto retry; > + > + if(nblocks) { > + mutex_lock(&inode->i_mutex); > + inode->i_size = inode->i_size + (nblocks >> blkbits); > + EXT4_I(inode)->i_disksize = inode->i_size; > + mutex_unlock(&inode->i_mutex); > + } Hmm... We should not need to worry about the inode->i_size if we are preallocating blocks for holes. And, Looking at other places calling ext4_*_get_blocks() in the kernel, it seems not all of them protected by i_mutex lock. I think it probably okay to not holding i_mutex during calling ext4_ext4_get_blocks(). > + > + ext4_mark_inode_dirty(handle, inode); > + ret2 = ext4_journal_stop(handle); > + if(ret > 0) > + ret = ret2; > + > + return ret > 0 ? nblocks : ret; > + } > + Since the API takes the number of bytes to preallocate, at return time, shall we convert the blocks to bytes to the user? Here it returns the number of allocated blocks to the user. Do we need to worry about the case when dealing with a range with partial hole and partial blocks already allocated? In that case nblocks(the new preallocated blocks) will less than the maxblocks (the number of blocks asked by application). I am wondering what does other filesystem like xfs do? Maybe we should do the same thing. Thanks, Mingming - To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html