Re: [PATCH] ext4: fix race aio-dio vs freeze_fs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Dmitry Monakhov <dmonakhov@xxxxxxxxxx> writes:

> After freeze_fs was revoked (from Jan Kara) pages's write-back completion
> is deffered before unwritten conversion, so explicit flush_unwritten_io()
> was removed here: c724585b62411
> But we still may face deferred conversion for aio-dio case
> # Trivial testcase
> for ((i=0;i<60;i++));do fsfreeze -f /mnt ;sleep 1;fsfreeze -u /mnt;done &
> fio --bs=4k --ioengine=libaio --iodepth=128 --size=1g --direct=1 \
>     --runtime=60 --filename=/mnt/file --name=rand-write --rw=randwrite
> NOTE: Sane testcase should be integrated to xfstests, but it requires
> changes in common/* code, so let's use this this test at the moment.
>
> In order to fix this race we have to guard journal transaction with explicit
> sb_{start,end}_intwrite()  as we do with ext4_evict_inode here:8e8ad8a5
Fairly to say I'm not very happy with the fix because it continues bad
practice of ad-hock fixes for generic journal vs freeze synchronization

Ideal fix would be to move sb_start_intwrite/sb_end_intwrite() to
ext4_journal_start()/ext4_journal_stop() but this is not possible due to
limitations introduced by nojournal mode (described here:8e8ad8a5)
So let's fix nojournal instead. In order to do that we somehow have
store ref_count and pointer to sb inside nojournal_handle.
There are two possible ways to do that.
1) Embed second journal related field to task_struct and guard it with
   compile macros definition.
void *journal_info;
+ #ifdef CONFIG_EXTRA_JOURNAL_INFO
+   void *journal_info2;
+ #endif

2) Encode ref and sb in to single long. This can be done by aligning
   ext4_sb_info pointer to 4096. So we can embed ref count to lower bits
   like follows.
#define EXT4_NOJOURNAL_SHIFT 12
#define EXT4_NOJOURNAL_MAX_REF_COUNT 1 << (EXT4_NOJOURNAL_SHIFT-1)
#define EXT4_NOJOURNAL_MASK  (1 << EXT4_NOJOURNAL_SHIFT) -1
#define NOJOURNAL_SB(handle) (handle & ~EXT4_NOJOURNAL_MASK)
#define NOJOURNAL_REF(handle) ((handle & ~EXT4_NOJOURNAL_MASK) >> 1)
static int ext4_handle_valid(handle_t *handle)
{
        return !(handle & 0x1);
}
static handle_t *get_nojournal_handle(struct super_block *sb)
{
        handle_t *handle = current->journal_info;
        struct super_block *old_sb  = NOJOURNAL_SB(handle);
        unsigned long ref_cnt = NOJOURNAL_REF(handle);
        BUG_ON(old_sb && old_sb != sb);
        ref++;
        current->journal_info = NOJOURNAL_SB(handle);
}

What do you think about this? Are where any better way to fix this?

>
> Signed-off-by: Dmitry Monakhov <dmonakhov@xxxxxxxxxx>
> ---
>  fs/ext4/extents.c |    7 +++++++
>  1 files changed, 7 insertions(+), 0 deletions(-)
>
> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> index 3a6197a..4cba944 100644
> --- a/fs/ext4/extents.c
> +++ b/fs/ext4/extents.c
> @@ -5040,6 +5040,12 @@ int ext4_convert_unwritten_extents(handle_t *handle, struct inode *inode,
>  	max_blocks = ((EXT4_BLOCK_ALIGN(len + offset, blkbits) >> blkbits) -
>  		      map.m_lblk);
>  	/*
> +	 * Protect us against freezing - AIO-DIO case. Caller didn't have to
> +	 * have any protection against it
> +	 */
> +	sb_start_intwrite(inode->i_sb);
> +
> +	/*
>  	 * This is somewhat ugly but the idea is clear: When transaction is
>  	 * reserved, everything goes into it. Otherwise we rather start several
>  	 * smaller transactions for conversion of each extent separately.
> @@ -5083,6 +5089,7 @@ int ext4_convert_unwritten_extents(handle_t *handle, struct inode *inode,
>  	}
>  	if (!credits)
>  		ret2 = ext4_journal_stop(handle);
> +	sb_end_intwrite(inode->i_sb);
>  	return ret > 0 ? ret2 : ret;
>  }
>  
> -- 
> 1.7.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Reiser Filesystem Development]     [Ceph FS]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite National Park]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Device Mapper]     [Linux Media]

  Powered by Linux