On Mon, Nov 30 2015, Guoqing Jiang wrote: > +#define MD_CLUSTER_SEND_LOCK 4 > +/* If cluster operations must lock the communication channel, > + * so as to perform extra operations (and no other operation > + * is allowed on the MD, such as adding a disk. Token needs > + * to be locked and held until the operation completes with > + * a md_update_sb(), which would eventually release the lock. > + */ This comment has unbalanced parentheses. How did it even compile :-) But there is something else that isn't as clear as it could be.... > > @@ -970,14 +1019,18 @@ static int add_new_disk(struct mddev *mddev, struct md_rdev *rdev) > ret = -ENOENT; > if (ret) > unlock_comm(cinfo); > - else > + else { > dlm_lock_sync(cinfo->no_new_dev_lockres, DLM_LOCK_CR); > + set_bit(MD_CLUSTER_SEND_LOCKED_ALREADY, &cinfo->state); > + wake_up(&cinfo->wait); > + } This code suggests that something has caused a metadata update to be triggered. i.e. something set MD_CHANGE_DEVS or similar. That is presumably add_bound_rdev() - which hasn't yet been called, but while soon after in add_new_disk(). This feels a little bit fragile. The new code in md-cluster is only correct if code in add_new_disk does a particular thing. I guess I'd at least like to see a comment in add_new_disk() in md-cluster explaining what will cause MD_CLUSTE_SEND_LOCKED_ALREADY to eventually be cleared. Maybe it could also schedule the MD_CHANGE_DEVS to be completely certain that will happen, but that probably isn't necessary. So I've applied these for now, but if you could fix one comment and add another that would help. Thanks, NeilBrown
Attachment:
signature.asc
Description: PGP signature