Re: [PATCH 3/3] vfs: fix filesystem_sync vs write race on rw=>ro remount v2

Nick Piggin <npiggin@xxxxxxx> · Wed, 3 Mar 2010 02:21:11 +1100

On Sat, Feb 27, 2010 at 03:25:49AM +0300, Dmitry Monakhov wrote:
> diff --git a/fs/namespace.c b/fs/namespace.c
> index e816097..bc79f1f 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -277,6 +277,13 @@ int mnt_want_write(struct vfsmount *mnt)
>  		dec_mnt_writers(mnt);
>  		ret = -EROFS;
>  		goto out;
> +	} else {
> +		/*
> +		 * Clear ST_REMOUNT_RO flag to let remount task know
> +		 * about new writers.
> +		 */
> +		if (unlikely(test_bit(ST_REMOUNT_RO, &mnt->mnt_sb->s_state)))
> +			clear_bit(ST_REMOUNT_RO, &mnt->mnt_sb->s_state);
>  	}
>  out:
>  	preempt_enable();

I'm not totally sure you've covered the race here. The other side:

store s_state <- ST_REMOUNT_RO
smp_mb()
load  write count
spin_unlock()
load  s_state

So to start with, the load of s_state could be moved up above one
of the loads of write_count.

This side:
store  write count
smp_mb()
load   s_state
store  s_state

The memory orderings themselves should be OK, although you haven't
updated the existing smp_mb() comments to say that it now also orders
s_state manipulation with write count manipulation and hence also
couples with the new smp_mb you introduced[*].

However if you do this store/store versus load/load sequence for
synchronization (or load/store vs store/load), then you usually need to
do them in opposite order otherwise you lose synchronization: couldn't
write count be loaded in (1) before it is incremented in (2), then
s_state loaded in (1) before it is cleared in (2)?

[*] please explicitly comment all barriers (including acquire and release
barriers if you use them for something other than just the critical
sections). For *every* barrier site, comment or at least point to
comments that document all actors in the protocol.

> @@ -581,18 +584,33 @@ int do_remount_sb(struct super_block *sb, int flags, void *data, int force)
>  
>  	if (flags & MS_RDONLY)
>  		acct_auto_close(sb);
> +	if (remount_ro) {
> +		if (force)
> +			mark_files_ro(sb);

It's a pity we still need the files list for mark_files_ro.

> +		/*
> +		 * Store ST_REMOUNT_RO flag. New writers (in any) will clrear
> +		 * this bit.
> +		 */
> +		set_bit(ST_REMOUNT_RO, &sb->s_state);
> +		/*
> +		 * This store should be visible before we do.
> +		 */
> +		smp_mb();
> +		/*
> +		 * To prevent sync vs write race condition we have to check
> +		 * writers before filesystem sync.
> +		 */
> +		if (!fs_may_remount_ro(sb))
> +			return -EBUSY;

I guess in the 'force' case, you may actually want to try harder than
this.

I don't know how you can reasonably do that and prevent livelock,
though. Apart from doing this totally differently and putting a sleeping
lock into mnt_want_write, taken only in the slowpath when a remount
action is pending. This should make things simpler here but no idea
whether mnt_want_write is called from atomic context. 

Overall I don't know what to think of the direction you're taking here.
I guess it's probably viable, but OTOH if we were able to block in
mnt_want_write then we should be able to just eliminate all races
and make it work still by just traversing the files list. (I would
like it much more if mark_files_ro could go away at the same time,
but I don't see how).

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html