Thank you Andreas and Ted for reviewing the patch. Ted, thank you for pointing out about the subject line when sending a new patch - I'll be careful next time. I have handled all the review comments. I will send out a patch with handled comments. About testing this patch, I have created a new xfstest that verifies that directory shrinks after deleting files in it. I'll send that patch out too. On Tue, Jan 29, 2019 at 5:58 PM Andreas Dilger <adilger@xxxxxxxxx> wrote: > > On Jan 23, 2019, at 11:32 AM, harshadshirwadkar@xxxxxxxxx wrote: > > > > From: Harshad Shirwadkar <harshadshirwadkar@xxxxxxxxx> > > > > This patch is the first step towards shrinking htree based directories > > when files are deleted. We truncate directory inode when a directory > > entry removal causes last directory block to be empty. This patch just > > removes the last block. We may end up in a situation where many > > intermediate dirent blocks in directory inode are empty. Removing > > those blocks would be handled later. > > > > Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@xxxxxxxxx> > > --- > > fs/ext4/namei.c | 130 ++++++++++++++++++++++++++++++++++++++++-------- > > 1 file changed, 109 insertions(+), 21 deletions(-) > > > > diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c > > index 2a4c25c4681d..79cb88ba898a 100644 > > --- a/fs/ext4/namei.c > > +++ b/fs/ext4/namei.c > > @@ -273,7 +273,8 @@ static int ext4_htree_next_block(struct inode *dir, __u32 hash, > > __u32 *start_hash); > > static struct buffer_head * ext4_dx_find_entry(struct inode *dir, > > struct ext4_filename *fname, > > - struct ext4_dir_entry_2 **res_dir); > > + struct ext4_dir_entry_2 **res_dir, > > + struct dx_frame *dx_frame); > > (style) these should align after '(' on the first line > > > @@ -380,6 +381,7 @@ static void ext4_dirent_csum_set(struct inode *inode, > > (void *)t - (void *)dirent); > > } > > > > + > > int ext4_handle_dirty_dirent_node(handle_t *handle, > > struct inode *inode, > > struct buffer_head *bh) > > (style) No need for this hunk. > > > @@ -866,6 +868,42 @@ dx_probe(struct ext4_filename *fname, struct inode *dir, > > return ret_err; > > } > > > > +static void > > +ext4_dx_delete_entry(handle_t *handle, struct inode *dir, > > + struct dx_frame *dx_frame, __le64 block) > > +{ > > + struct dx_entry *entries; > > + unsigned int count, limit; > > + int err, i; > > + > > + entries = dx_frame->entries; > > + count = dx_get_count(entries); > > + limit = dx_get_limit(entries); > > + > > + for (i = 0; i < count; i++) > > + if (entries[i].block == block) > > + break; > > + > > + if (i >= count) > > + return; > > + > > + err = ext4_journal_get_write_access(handle, dx_frame->bh); > > + if (err) { > > + ext4_std_error(dir->i_sb, err); > > + return; > > + } > > + > > + for (; i < count - 1; i++) > > + entries[i] = entries[i + 1]; > > + > > + dx_set_count(entries, count - 1); > > + dx_set_limit(entries, limit); > > I don't think you need to update the limit just because of the count changing? > This is just setting it back to the same value that dx_get_limit() returned earlier. > > > @@ -1309,6 +1347,14 @@ > > +static int is_empty_dirent_block(struct inode *dir, struct buffer_head *bh) > > (style) this function should be type bool. The older "is_*" functions were written > before "bool" was available in the kernel, though a separate cleanup patch to convert > them to bool would be welcome. > > > +{ > > + struct ext4_dir_entry_2 *de = (struct ext4_dir_entry_2 *)bh->b_data; > > + > > + return (ext4_rec_len_from_disk(de->rec_len, dir->i_sb->s_blocksize) > > + == dir->i_sb->s_blocksize); > > (style) no need for () around return value. > > > > > @@ -1339,7 +1385,8 @@ static int is_dx_internal_node(struct inode *dir, ext4_lblk_t block, > > static struct buffer_head * ext4_find_entry (struct inode *dir, > > const struct qstr *d_name, > > struct ext4_dir_entry_2 **res_dir, > > - int *inlined) > > + int *inlined, > > + struct dx_frame *dx_frame) > > (style) this can fit on the previous line, and they can all align after '(' on > the first line. Please also remove the space after '*' in the function declaration. > > > @@ -1486,9 +1533,10 @@ static struct buffer_head * ext4_find_entry (struct inode *dir, > > return ret; > > } > > > > -static struct buffer_head * ext4_dx_find_entry(struct inode *dir, > > - struct ext4_filename *fname, > > - struct ext4_dir_entry_2 **res_dir) > > +static struct buffer_head *ext4_dx_find_entry( > > + struct inode *dir, struct ext4_filename *fname, > > (style) it isn't clear why you moved "struct inode *dir" from the previous line? > The continued lines look like they can properly align after '(' on the previous line. > > > @@ -2337,9 +2389,13 @@ int ext4_generic_delete_entry(handle_t *handle, > > static int ext4_delete_entry(handle_t *handle, > > struct inode *dir, > > struct ext4_dir_entry_2 *de_del, > > - struct buffer_head *bh) > > + struct buffer_head *bh, > > + struct dx_frame *dx_frame) > > { > > + struct ext4_map_blocks map; > > int err, csum_size = 0; > > + int should_truncate = 0; > > (style) bool should_truncate > > > @@ -2363,11 +2419,35 @@ static int ext4_delete_entry(handle_t *handle, > > if (err) > > goto out; > > > > + if (dx_frame && dx_frame->bh && is_empty_dirent_block(dir, bh)) { > > + > > (style) no blank line here. > > > + map.m_lblk = (dir->i_size - 1) >> > > + (dir->i_sb->s_blocksize_bits); > > (style) no need for parenthesis around dir->i_sb->s_blocksize_bits. > (style) this can fit on the previous line and still be under 80 columns. > > > + map.m_len = 1; > > + err = ext4_map_blocks(handle, dir, &map, 0); > > + > > + if ((err > 0) && (bh->b_blocknr == map.m_pblk)) { > > (style) no need for extra parenthesis around "err > 0" and "b_blocknr == m_pblk". > > > + ext4_dx_delete_entry(handle, dir, dx_frame, > > + cpu_to_le64(map.m_lblk)); > > + should_truncate = 1; > > + } > > + } > > + > > BUFFER_TRACE(bh, "call ext4_handle_dirty_metadata"); > > + > > err = ext4_handle_dirty_dirent_node(handle, dir, bh); > > if (unlikely(err)) > > goto out; > > > > + if (should_truncate) { > > + dir->i_size -= dir->i_sb->s_blocksize; > > + ext4_truncate(dir); > > Since we have already mapped the last block above, I wonder if we can do something > lighter weight than calling ext4_truncate() here? Maybe moving the ext4_truncate() > guts into a helper function like ext4_truncate_internal() that avoids all of the > complexity (orphan list, resizing the transaction, inline data, etc). > > > + if (dir->i_size == dir->i_sb->s_blocksize) { > > + ext4_clear_inode_flag(dir, EXT4_INODE_INDEX); > > + ext4_mark_inode_dirty(handle, dir); > > Are there any places in the code that assume an indexed directory will remain as such? > The tricky part here is that any valid htree directory will have 3 blocks (dx_root, > plus two leaf blocks), so it might be problematic to go back to a single non-htree > block, especially since this doesn't look like it has any locking. It is probably > worthwhile to check the code for this. After handling Ted's comment about not deleting the last block if it makes intermediate node empty, this patch will never result in a situation where dir->i_size == blocksize. So, this hunk is not needed anymore. > > > @@ -2985,6 +3065,9 @@ static int ext4_unlink(struct inode *dir, struct dentry *dentry) > > struct buffer_head *bh; > > struct ext4_dir_entry_2 *de; > > handle_t *handle = NULL; > > + struct dx_frame dx_frame; > > + > > + memset(&dx_frame, 0, sizeof(dx_frame)); > > Rather than an explicit memset(), this can use a struct initializer when it is declared: > > struct dx_frame dx_frame = { NULL }; > > Are there existing xfstests that cover this case? Certainly deleting files from a > directory, but it would be useful to have something that checks whether the directory > size is shrinking and this code is working. > > Cheers, Andreas > > > > >