Re: [PATCH v2] ocfs2: fix data corruption by fallocate

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Can we simply replace i_size_read() with 'orig_isize' and leave isize
update along with other dirty inode operations?
I think this makes more comfortable for the dirty inode transaction.

Thanks,
Joseph 

On 5/26/21 1:58 AM, Junxiao Bi wrote:
> I would like make the following change to the patch, is that ok to you?
> 
> diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
> index 17469fc7b20e..775657943057 100644
> --- a/fs/ocfs2/file.c
> +++ b/fs/ocfs2/file.c
> @@ -1999,9 +1999,12 @@ static int __ocfs2_change_file_space(struct file *file, struct inode *inode,
>         }
> 
>         /* zeroout eof blocks in the cluster. */
> -       if (!ret && change_size && orig_isize < size)
> +       if (!ret && change_size && orig_isize < size) {
>                 ret = ocfs2_zeroout_partial_cluster(inode, orig_isize,
>                                         size - orig_isize);
> +               if (!ret)
> +                       i_size_write(inode, size);
> +       }
>         up_write(&OCFS2_I(inode)->ip_alloc_sem);
>         if (ret) {
>                 mlog_errno(ret);
> @@ -2018,9 +2021,6 @@ static int __ocfs2_change_file_space(struct file *file, struct inode *inode,
>                 goto out_inode_unlock;
>         }
> 
> -       if (change_size && i_size_read(inode) < size)
> -               i_size_write(inode, size);
> -
>         inode->i_ctime = inode->i_mtime = current_time(inode);
>         ret = ocfs2_mark_inode_dirty(handle, inode, di_bh);
>         if (ret < 0)
> 
> Thanks,
> 
> Junxiao.
> 
> On 5/24/21 7:04 PM, Joseph Qi wrote:
>> Thanks for the explanations.
>> A tiny cleanup, we can use 'orig_isize' instead of i_size_read() later
>> in __ocfs2_change_file_space().
>> Other looks good to me.
>> Reviewed-by: Joseph Qi <joseph.qi@xxxxxxxxxxxxxxxxx>
>>
>> On 5/25/21 12:23 AM, Junxiao Bi wrote:
>>> That will not work, buffer write zero first, then update i_size, in between writeback could be kicked in and clear those dirty buffers because they were out of i_size. Beside that, OCFS2_IOC_RESVSP64 was never doing right job, it didn't take care eof blocks in the last cluster, that made even a simple fallocate to extend file size could cause corruption. This patch fixed both issues.
>>>
>>> Thanks,
>>>
>>> Junxiao.
>>>
>>> On 5/23/21 4:52 AM, Joseph Qi wrote:
>>>> Hi Junxiao,
>>>> If change_size is true (!FALLOC_FL_KEEP_SIZE), it will update isize
>>>> in __ocfs2_change_file_space(). Why do we have to zeroout first?
>>>>
>>>> Thanks,
>>>> Joseph
>>>>
>>>> On 5/22/21 7:36 AM, Junxiao Bi wrote:
>>>>> When fallocate punches holes out of inode size, if original isize is in
>>>>> the middle of last cluster, then the part from isize to the end of the
>>>>> cluster will be zeroed with buffer write, at that time isize is not
>>>>> yet updated to match the new size, if writeback is kicked in, it will
>>>>> invoke ocfs2_writepage()->block_write_full_page() where the pages out
>>>>> of inode size will be dropped. That will cause file corruption. Fix
>>>>> this by zero out eof blocks when extending the inode size.
>>>>>
>>>>> Running the following command with qemu-image 4.2.1 can get a corrupted
>>>>> coverted image file easily.
>>>>>
>>>>>       qemu-img convert -p -t none -T none -f qcow2 $qcow_image \
>>>>>                -O qcow2 -o compat=1.1 $qcow_image.conv
>>>>>
>>>>> The usage of fallocate in qemu is like this, it first punches holes out of
>>>>> inode size, then extend the inode size.
>>>>>
>>>>>       fallocate(11, FALLOC_FL_KEEP_SIZE|FALLOC_FL_PUNCH_HOLE, 2276196352, 65536) = 0
>>>>>       fallocate(11, 0, 2276196352, 65536) = 0
>>>>>
>>>>> v1: https://www.spinics.net/lists/linux-fsdevel/msg193999.html
>>>>>
>>>>> Cc: <stable@xxxxxxxxxxxxxxx>
>>>>> Cc: Jan Kara <jack@xxxxxxx>
>>>>> Signed-off-by: Junxiao Bi <junxiao.bi@xxxxxxxxxx>
>>>>> ---
>>>>>
>>>>> Changes in v2:
>>>>> - suggested by Jan Kara, using sb_issue_zeroout to zero eof blocks in disk directly.
>>>>>
>>>>>    fs/ocfs2/file.c | 49 +++++++++++++++++++++++++++++++++++++++++++++++--
>>>>>    1 file changed, 47 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
>>>>> index f17c3d33fb18..17469fc7b20e 100644
>>>>> --- a/fs/ocfs2/file.c
>>>>> +++ b/fs/ocfs2/file.c
>>>>> @@ -1855,6 +1855,45 @@ int ocfs2_remove_inode_range(struct inode *inode,
>>>>>        return ret;
>>>>>    }
>>>>>    +/*
>>>>> + * zero out partial blocks of one cluster.
>>>>> + *
>>>>> + * start: file offset where zero starts, will be made upper block aligned.
>>>>> + * len: it will be trimmed to the end of current cluster if "start + len"
>>>>> + *      is bigger than it.
>>>>> + */
>>>>> +static int ocfs2_zeroout_partial_cluster(struct inode *inode,
>>>>> +                    u64 start, u64 len)
>>>>> +{
>>>>> +    int ret;
>>>>> +    u64 start_block, end_block, nr_blocks;
>>>>> +    u64 p_block, offset;
>>>>> +    u32 cluster, p_cluster, nr_clusters;
>>>>> +    struct super_block *sb = inode->i_sb;
>>>>> +    u64 end = ocfs2_align_bytes_to_clusters(sb, start);
>>>>> +
>>>>> +    if (start + len < end)
>>>>> +        end = start + len;
>>>>> +
>>>>> +    start_block = ocfs2_blocks_for_bytes(sb, start);
>>>>> +    end_block = ocfs2_blocks_for_bytes(sb, end);
>>>>> +    nr_blocks = end_block - start_block;
>>>>> +    if (!nr_blocks)
>>>>> +        return 0;
>>>>> +
>>>>> +    cluster = ocfs2_bytes_to_clusters(sb, start);
>>>>> +    ret = ocfs2_get_clusters(inode, cluster, &p_cluster,
>>>>> +                &nr_clusters, NULL);
>>>>> +    if (ret)
>>>>> +        return ret;
>>>>> +    if (!p_cluster)
>>>>> +        return 0;
>>>>> +
>>>>> +    offset = start_block - ocfs2_clusters_to_blocks(sb, cluster);
>>>>> +    p_block = ocfs2_clusters_to_blocks(sb, p_cluster) + offset;
>>>>> +    return sb_issue_zeroout(sb, p_block, nr_blocks, GFP_NOFS);
>>>>> +}
>>>>> +
>>>>>    /*
>>>>>     * Parts of this function taken from xfs_change_file_space()
>>>>>     */
>>>>> @@ -1865,7 +1904,7 @@ static int __ocfs2_change_file_space(struct file *file, struct inode *inode,
>>>>>    {
>>>>>        int ret;
>>>>>        s64 llen;
>>>>> -    loff_t size;
>>>>> +    loff_t size, orig_isize;
>>>>>        struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
>>>>>        struct buffer_head *di_bh = NULL;
>>>>>        handle_t *handle;
>>>>> @@ -1896,6 +1935,7 @@ static int __ocfs2_change_file_space(struct file *file, struct inode *inode,
>>>>>            goto out_inode_unlock;
>>>>>        }
>>>>>    +    orig_isize = i_size_read(inode);
>>>>>        switch (sr->l_whence) {
>>>>>        case 0: /*SEEK_SET*/
>>>>>            break;
>>>>> @@ -1903,7 +1943,7 @@ static int __ocfs2_change_file_space(struct file *file, struct inode *inode,
>>>>>            sr->l_start += f_pos;
>>>>>            break;
>>>>>        case 2: /*SEEK_END*/
>>>>> -        sr->l_start += i_size_read(inode);
>>>>> +        sr->l_start += orig_isize;
>>>>>            break;
>>>>>        default:
>>>>>            ret = -EINVAL;
>>>>> @@ -1957,6 +1997,11 @@ static int __ocfs2_change_file_space(struct file *file, struct inode *inode,
>>>>>        default:
>>>>>            ret = -EINVAL;
>>>>>        }
>>>>> +
>>>>> +    /* zeroout eof blocks in the cluster. */
>>>>> +    if (!ret && change_size && orig_isize < size)
>>>>> +        ret = ocfs2_zeroout_partial_cluster(inode, orig_isize,
>>>>> +                    size - orig_isize);
>>>>>        up_write(&OCFS2_I(inode)->ip_alloc_sem);
>>>>>        if (ret) {
>>>>>            mlog_errno(ret);
>>>>>



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux