Re: [External] Re: [PATCH 1/2] ext4: use ext4_ext_remove_space() for fast commit replay delete range

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Feb 2, 2022 at 4:34 AM Ritesh Harjani <riteshh@xxxxxxxxxxxxx> wrote:
>
> Hello Xin,
>
> Sorry about revisiting this thread so late :(
> Recently when I was working on one of the fast_commit issue, I got interested
> in looking into some of those recent fast_commit fixes.
>
> Hence some of these queries.
>
> On 21/12/23 11:23AM, Xin Yin wrote:
> > For now ,we use ext4_punch_hole() during fast commit replay delete range
> > procedure. But it will be affected by inode->i_size, which may not
> > correct during fast commit replay procedure. The following test will
> > failed.
> >
> > -create & write foo (len 1000K)
> > -falloc FALLOC_FL_ZERO_RANGE foo (range 400K - 600K)
> > -create & fsync bar
> ^^^^ do you mean "fsync foo" or is this actually a new file create and fsync
> bar?
bar is a new created file, it is the brother file of foo , it would be
like this.
./foo ./bar

>
>
> > -falloc FALLOC_FL_PUNCH_HOLE foo (range 300K-500K)
> > -fsync foo
> > -crash before a full commit
> >
> > After the fast_commit reply procedure, the range 400K-500K will not be
> > removed. Because in this case, when calling ext4_punch_hole() the
> > inode->i_size is 0, and it just retruns with doing nothing.
>
> I tried looking into this, but I am not able to put my head around that when
> will the inode->i_size will be 0?
>
> So, what I think should happen is when you are doing falocate/fsync foo in your
> above list of operations then, anyways the inode i_disksize will be updated
> using ext4_mark_inode_dirty() and during replay phase inode->i_size will hold
> the right value no?
yes, the inode->i_size hold the right value and ext4_fc_replay_inode()
will update inode to the final state, but during replay phase
ext4_fc_replay_inode() usually is the last step,  so before this the
inode->i_size may not correct.

>
> Could you please help understand when, where and how will inode->i_size will be
> 0?
I didn't check why inode->i_size is 0, in this case. I just think
inode->i_size should not affect the behavior of the replay phase.
Another case is inode->i_size may not include unwritten blocks , and
if a file has unwritten blocks at bottom, we can not use
ext4_punch_hole() to remove the unwritten blocks beyond i_size during
the replay phase.

>
> Also - it would be helpful if you have some easy reproducer of this issue you
> mentioned.
The attached test code can reproduce this issue, hope it helps.


>
> -ritesh
>
> >
> > Change to use ext4_ext_remove_space() instead of ext4_punch_hole()
> > to remove blocks of inode directly.
> >
> > Signed-off-by: Xin Yin <yinxin.x@xxxxxxxxxxxxx>
> > ---
> >  fs/ext4/fast_commit.c | 13 ++++++++-----
> >  1 file changed, 8 insertions(+), 5 deletions(-)
> >
> > diff --git a/fs/ext4/fast_commit.c b/fs/ext4/fast_commit.c
> > index aa05b23f9c14..3deb97b22ca4 100644
> > --- a/fs/ext4/fast_commit.c
> > +++ b/fs/ext4/fast_commit.c
> > @@ -1708,11 +1708,14 @@ ext4_fc_replay_del_range(struct super_block *sb, struct ext4_fc_tl *tl,
> >               }
> >       }
> >
> > -     ret = ext4_punch_hole(inode,
> > -             le32_to_cpu(lrange.fc_lblk) << sb->s_blocksize_bits,
> > -             le32_to_cpu(lrange.fc_len) <<  sb->s_blocksize_bits);
> > -     if (ret)
> > -             jbd_debug(1, "ext4_punch_hole returned %d", ret);
> > +     down_write(&EXT4_I(inode)->i_data_sem);
> > +     ret = ext4_ext_remove_space(inode, lrange.fc_lblk,
> > +                             lrange.fc_lblk + lrange.fc_len - 1);
> > +     up_write(&EXT4_I(inode)->i_data_sem);
> > +     if (ret) {
> > +             iput(inode);
> > +             return 0;
> > +     }
> >       ext4_ext_replay_shrink_inode(inode,
> >               i_size_read(inode) >> sb->s_blocksize_bits);
> >       ext4_mark_inode_dirty(NULL, inode);
> > --
> > 2.20.1
> >
#define _GNU_SOURCE 

#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
#include <unistd.h>
#include <stdlib.h>
#include <string.h>
#include <fcntl.h>
#include <sys/stat.h>

#define FILE_SIZE  1024000
#define HOLE_START	409600
#define HOLE_LEN	204800
#define HOLE_SHIFET	102400


int main(int argc, char *argv[])
{
	int fd;
	int ret;
	void* data_foo;
	int fd_bar;
	char bar_path[256];
	
	if (argc != 2) {
		printf("usage: a.out [file Path]\n");
		exit(1);
	}
	sprintf(bar_path,"%s_bar",argv[1]);
	
	printf("file path: %s \n",argv[1]);
	printf("file_bar path: %s \n",bar_path);

	fd =  open(argv[1], O_CREAT | O_RDWR  , 0755);
	if (fd < 0) {
		printf("open err! \n");
		exit(1);
	}
	
	data_foo = malloc(FILE_SIZE);
	if (!data_foo) {
		printf("malloc err! \n");
		exit(1);
	}
	
	int offset_foo = 0;
	int to_write_foo = FILE_SIZE ;
	const char *text_foo  = "ddddddddddklmnopqrstuvwxyz123456";
	while (offset_foo < FILE_SIZE){
		if (to_write_foo < 32){
			memcpy((char *)data_foo+ offset_foo, text_foo, to_write_foo);
			offset_foo += to_write_foo;
		}
		else {
			memcpy((char *)data_foo+ offset_foo,text_foo, 32);
			offset_foo += 32; 
		} 
	} 	

	ret = pwrite(fd, data_foo, FILE_SIZE, 0);
	if (ret != FILE_SIZE) {
		printf("write err! [%d] \n",ret);
		exit(1);
	}	
	
	ret = fallocate( fd , FALLOC_FL_ZERO_RANGE | FALLOC_FL_KEEP_SIZE , HOLE_START , HOLE_LEN);
	if ( ret < 0){ 
		printf("fallocate err! [%d] \n",ret);
		exit(1);
	}

	fd_bar = open(bar_path, O_CREAT | O_RDWR , 0755);
	if (fd_bar < 0) {
		printf("open fd_bar err! \n");
		exit(1);			
	}
	fsync(fd_bar);
	close(fd_bar);

	ret = fallocate( fd , FALLOC_FL_PUNCH_HOLE|FALLOC_FL_KEEP_SIZE , HOLE_START-HOLE_SHIFET , HOLE_LEN);
	if ( ret < 0){ 
		printf("fallocate err! [%d] \n",ret);
		exit(1);
	}
	fsync(fd);
	close(fd);
	exit(0);
}

[Index of Archives]     [Reiser Filesystem Development]     [Ceph FS]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite National Park]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Device Mapper]     [Linux Media]

  Powered by Linux