Re: [PATCH V3 2/2] md-cluster: Protect communication with mutexes

NeilBrown <neilb@xxxxxxxx> · Mon, 30 Nov 2015 11:58:41 +1100

On Mon, Nov 30 2015, Guoqing Jiang wrote:

> +#define		MD_CLUSTER_SEND_LOCK			4
> +/* If cluster operations must lock the communication channel,
> + * so as to perform extra operations (and no other operation
> + * is allowed on the MD, such as adding a disk. Token needs
> + * to be locked and held until the operation completes with
> + * a md_update_sb(), which would eventually release the lock.
> + */

This comment has unbalanced parentheses.  How did it even compile :-)

But there is something else that isn't as clear as it could be....

>  
> @@ -970,14 +1019,18 @@ static int add_new_disk(struct mddev *mddev, struct md_rdev *rdev)
>  		ret = -ENOENT;
>  	if (ret)
>  		unlock_comm(cinfo);
> -	else
> +	else {
>  		dlm_lock_sync(cinfo->no_new_dev_lockres, DLM_LOCK_CR);
> +		set_bit(MD_CLUSTER_SEND_LOCKED_ALREADY, &cinfo->state);
> +		wake_up(&cinfo->wait);
> +	}

This code suggests that something has caused a metadata update to be
triggered.  i.e. something set MD_CHANGE_DEVS or similar.
That is presumably add_bound_rdev() - which hasn't yet been called, but
while soon after in add_new_disk().

This feels a little bit fragile.  The new code in md-cluster is only
correct if code in add_new_disk does a particular thing.  I guess I'd at
least like to see a comment in add_new_disk() in md-cluster explaining
what will cause MD_CLUSTE_SEND_LOCKED_ALREADY to eventually be cleared.
Maybe it could also schedule the MD_CHANGE_DEVS to be completely certain
that will happen, but that probably isn't necessary.

So I've applied these for now, but if you could fix one comment and add
another that would help.

Thanks,
NeilBrown
Attachment:
signature.asc

Description: PGP signature