Can we simply replace i_size_read() with 'orig_isize' and leave isize update along with other dirty inode operations? I think this makes more comfortable for the dirty inode transaction. Thanks, Joseph On 5/26/21 1:58 AM, Junxiao Bi wrote: > I would like make the following change to the patch, is that ok to you? > > diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c > index 17469fc7b20e..775657943057 100644 > --- a/fs/ocfs2/file.c > +++ b/fs/ocfs2/file.c > @@ -1999,9 +1999,12 @@ static int __ocfs2_change_file_space(struct file *file, struct inode *inode, > } > > /* zeroout eof blocks in the cluster. */ > - if (!ret && change_size && orig_isize < size) > + if (!ret && change_size && orig_isize < size) { > ret = ocfs2_zeroout_partial_cluster(inode, orig_isize, > size - orig_isize); > + if (!ret) > + i_size_write(inode, size); > + } > up_write(&OCFS2_I(inode)->ip_alloc_sem); > if (ret) { > mlog_errno(ret); > @@ -2018,9 +2021,6 @@ static int __ocfs2_change_file_space(struct file *file, struct inode *inode, > goto out_inode_unlock; > } > > - if (change_size && i_size_read(inode) < size) > - i_size_write(inode, size); > - > inode->i_ctime = inode->i_mtime = current_time(inode); > ret = ocfs2_mark_inode_dirty(handle, inode, di_bh); > if (ret < 0) > > Thanks, > > Junxiao. > > On 5/24/21 7:04 PM, Joseph Qi wrote: >> Thanks for the explanations. >> A tiny cleanup, we can use 'orig_isize' instead of i_size_read() later >> in __ocfs2_change_file_space(). >> Other looks good to me. >> Reviewed-by: Joseph Qi <joseph.qi@xxxxxxxxxxxxxxxxx> >> >> On 5/25/21 12:23 AM, Junxiao Bi wrote: >>> That will not work, buffer write zero first, then update i_size, in between writeback could be kicked in and clear those dirty buffers because they were out of i_size. Beside that, OCFS2_IOC_RESVSP64 was never doing right job, it didn't take care eof blocks in the last cluster, that made even a simple fallocate to extend file size could cause corruption. This patch fixed both issues. >>> >>> Thanks, >>> >>> Junxiao. >>> >>> On 5/23/21 4:52 AM, Joseph Qi wrote: >>>> Hi Junxiao, >>>> If change_size is true (!FALLOC_FL_KEEP_SIZE), it will update isize >>>> in __ocfs2_change_file_space(). Why do we have to zeroout first? >>>> >>>> Thanks, >>>> Joseph >>>> >>>> On 5/22/21 7:36 AM, Junxiao Bi wrote: >>>>> When fallocate punches holes out of inode size, if original isize is in >>>>> the middle of last cluster, then the part from isize to the end of the >>>>> cluster will be zeroed with buffer write, at that time isize is not >>>>> yet updated to match the new size, if writeback is kicked in, it will >>>>> invoke ocfs2_writepage()->block_write_full_page() where the pages out >>>>> of inode size will be dropped. That will cause file corruption. Fix >>>>> this by zero out eof blocks when extending the inode size. >>>>> >>>>> Running the following command with qemu-image 4.2.1 can get a corrupted >>>>> coverted image file easily. >>>>> >>>>> qemu-img convert -p -t none -T none -f qcow2 $qcow_image \ >>>>> -O qcow2 -o compat=1.1 $qcow_image.conv >>>>> >>>>> The usage of fallocate in qemu is like this, it first punches holes out of >>>>> inode size, then extend the inode size. >>>>> >>>>> fallocate(11, FALLOC_FL_KEEP_SIZE|FALLOC_FL_PUNCH_HOLE, 2276196352, 65536) = 0 >>>>> fallocate(11, 0, 2276196352, 65536) = 0 >>>>> >>>>> v1: https://www.spinics.net/lists/linux-fsdevel/msg193999.html >>>>> >>>>> Cc: <stable@xxxxxxxxxxxxxxx> >>>>> Cc: Jan Kara <jack@xxxxxxx> >>>>> Signed-off-by: Junxiao Bi <junxiao.bi@xxxxxxxxxx> >>>>> --- >>>>> >>>>> Changes in v2: >>>>> - suggested by Jan Kara, using sb_issue_zeroout to zero eof blocks in disk directly. >>>>> >>>>> fs/ocfs2/file.c | 49 +++++++++++++++++++++++++++++++++++++++++++++++-- >>>>> 1 file changed, 47 insertions(+), 2 deletions(-) >>>>> >>>>> diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c >>>>> index f17c3d33fb18..17469fc7b20e 100644 >>>>> --- a/fs/ocfs2/file.c >>>>> +++ b/fs/ocfs2/file.c >>>>> @@ -1855,6 +1855,45 @@ int ocfs2_remove_inode_range(struct inode *inode, >>>>> return ret; >>>>> } >>>>> +/* >>>>> + * zero out partial blocks of one cluster. >>>>> + * >>>>> + * start: file offset where zero starts, will be made upper block aligned. >>>>> + * len: it will be trimmed to the end of current cluster if "start + len" >>>>> + * is bigger than it. >>>>> + */ >>>>> +static int ocfs2_zeroout_partial_cluster(struct inode *inode, >>>>> + u64 start, u64 len) >>>>> +{ >>>>> + int ret; >>>>> + u64 start_block, end_block, nr_blocks; >>>>> + u64 p_block, offset; >>>>> + u32 cluster, p_cluster, nr_clusters; >>>>> + struct super_block *sb = inode->i_sb; >>>>> + u64 end = ocfs2_align_bytes_to_clusters(sb, start); >>>>> + >>>>> + if (start + len < end) >>>>> + end = start + len; >>>>> + >>>>> + start_block = ocfs2_blocks_for_bytes(sb, start); >>>>> + end_block = ocfs2_blocks_for_bytes(sb, end); >>>>> + nr_blocks = end_block - start_block; >>>>> + if (!nr_blocks) >>>>> + return 0; >>>>> + >>>>> + cluster = ocfs2_bytes_to_clusters(sb, start); >>>>> + ret = ocfs2_get_clusters(inode, cluster, &p_cluster, >>>>> + &nr_clusters, NULL); >>>>> + if (ret) >>>>> + return ret; >>>>> + if (!p_cluster) >>>>> + return 0; >>>>> + >>>>> + offset = start_block - ocfs2_clusters_to_blocks(sb, cluster); >>>>> + p_block = ocfs2_clusters_to_blocks(sb, p_cluster) + offset; >>>>> + return sb_issue_zeroout(sb, p_block, nr_blocks, GFP_NOFS); >>>>> +} >>>>> + >>>>> /* >>>>> * Parts of this function taken from xfs_change_file_space() >>>>> */ >>>>> @@ -1865,7 +1904,7 @@ static int __ocfs2_change_file_space(struct file *file, struct inode *inode, >>>>> { >>>>> int ret; >>>>> s64 llen; >>>>> - loff_t size; >>>>> + loff_t size, orig_isize; >>>>> struct ocfs2_super *osb = OCFS2_SB(inode->i_sb); >>>>> struct buffer_head *di_bh = NULL; >>>>> handle_t *handle; >>>>> @@ -1896,6 +1935,7 @@ static int __ocfs2_change_file_space(struct file *file, struct inode *inode, >>>>> goto out_inode_unlock; >>>>> } >>>>> + orig_isize = i_size_read(inode); >>>>> switch (sr->l_whence) { >>>>> case 0: /*SEEK_SET*/ >>>>> break; >>>>> @@ -1903,7 +1943,7 @@ static int __ocfs2_change_file_space(struct file *file, struct inode *inode, >>>>> sr->l_start += f_pos; >>>>> break; >>>>> case 2: /*SEEK_END*/ >>>>> - sr->l_start += i_size_read(inode); >>>>> + sr->l_start += orig_isize; >>>>> break; >>>>> default: >>>>> ret = -EINVAL; >>>>> @@ -1957,6 +1997,11 @@ static int __ocfs2_change_file_space(struct file *file, struct inode *inode, >>>>> default: >>>>> ret = -EINVAL; >>>>> } >>>>> + >>>>> + /* zeroout eof blocks in the cluster. */ >>>>> + if (!ret && change_size && orig_isize < size) >>>>> + ret = ocfs2_zeroout_partial_cluster(inode, orig_isize, >>>>> + size - orig_isize); >>>>> up_write(&OCFS2_I(inode)->ip_alloc_sem); >>>>> if (ret) { >>>>> mlog_errno(ret); >>>>>