On Mon 24-05-21 09:14:16, Junxiao Bi wrote: > On 5/24/21 1:55 AM, Jan Kara wrote: > > > On Fri 21-05-21 16:36:12, Junxiao Bi wrote: > > > When fallocate punches holes out of inode size, if original isize is in > > > the middle of last cluster, then the part from isize to the end of the > > > cluster will be zeroed with buffer write, at that time isize is not > > > yet updated to match the new size, if writeback is kicked in, it will > > > invoke ocfs2_writepage()->block_write_full_page() where the pages out > > > of inode size will be dropped. That will cause file corruption. Fix > > > this by zero out eof blocks when extending the inode size. > > > > > > Running the following command with qemu-image 4.2.1 can get a corrupted > > > coverted image file easily. > > > > > > qemu-img convert -p -t none -T none -f qcow2 $qcow_image \ > > > -O qcow2 -o compat=1.1 $qcow_image.conv > > > > > > The usage of fallocate in qemu is like this, it first punches holes out of > > > inode size, then extend the inode size. > > > > > > fallocate(11, FALLOC_FL_KEEP_SIZE|FALLOC_FL_PUNCH_HOLE, 2276196352, 65536) = 0 > > > fallocate(11, 0, 2276196352, 65536) = 0 > > > > > > v1: https://www.spinics.net/lists/linux-fsdevel/msg193999.html > > > > > > Cc: <stable@xxxxxxxxxxxxxxx> > > > Cc: Jan Kara <jack@xxxxxxx> > > > Signed-off-by: Junxiao Bi <junxiao.bi@xxxxxxxxxx> > > > --- > > > > > > Changes in v2: > > > - suggested by Jan Kara, using sb_issue_zeroout to zero eof blocks in disk directly. > > > > > > fs/ocfs2/file.c | 49 +++++++++++++++++++++++++++++++++++++++++++++++-- > > > 1 file changed, 47 insertions(+), 2 deletions(-) > > > > > > diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c > > > index f17c3d33fb18..17469fc7b20e 100644 > > > --- a/fs/ocfs2/file.c > > > +++ b/fs/ocfs2/file.c > > > @@ -1855,6 +1855,45 @@ int ocfs2_remove_inode_range(struct inode *inode, > > > return ret; > > > } > > > +/* > > > + * zero out partial blocks of one cluster. > > > + * > > > + * start: file offset where zero starts, will be made upper block aligned. > > > + * len: it will be trimmed to the end of current cluster if "start + len" > > > + * is bigger than it. > > You write this here but ... > > > > > + */ > > > +static int ocfs2_zeroout_partial_cluster(struct inode *inode, > > > + u64 start, u64 len) > > > +{ > > > + int ret; > > > + u64 start_block, end_block, nr_blocks; > > > + u64 p_block, offset; > > > + u32 cluster, p_cluster, nr_clusters; > > > + struct super_block *sb = inode->i_sb; > > > + u64 end = ocfs2_align_bytes_to_clusters(sb, start); > > > + > > > + if (start + len < end) > > > + end = start + len; > > ... here you check actually something else and I don't see where else would > > the trimming happen. > > Before the "if", end = ocfs2_align_bytes_to_clusters(sb, start), that is > the end of the cluster where "start" located. Ah sorry, I got confused. The code is correct. Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR