Re: [PATCH 19/23] scsi_dh_alua: Use workqueue for RTPG

Hannes Reinecke <hare@xxxxxxx> · Tue, 1 Sep 2015 14:57:57 +0200



On 09/01/2015 01:15 PM, Christoph Hellwig wrote:
>> +	unsigned long		expiry;
>> +	unsigned long		interval;
>> +	struct workqueue_struct *work_q;
>> +	struct delayed_work	rtpg_work;
>> +	struct delayed_work	stpg_work;
>> +	struct delayed_work	qdata_work;
>> +	spinlock_t		rtpg_lock;
> 
> This one also protects ->flag.  I'd just call it lock, and
> introduce it at the time struct alua_port_group is introduced,
> so that we can have coherent locking from the very beginning.
> 
Okay.

>> +struct alua_queue_data {
>> +	struct list_head	entry;
>>  	activate_complete	callback_fn;
>>  	void			*callback_data;
> 
> I've been looking over the active callback for a while, and I
> think the model of it is fundamentally broken.  Yes, as long as
> multipathing isn't in SCSI we'll need path change notifications
> for dm-mpath, but I don't see how tying them into ->activate
> makes any sense.
> 
> I guess for this series we're stuck with it, but in the long
> run we should have a callback in the request_queue which the
> consumer that has bd_claim()ed the device can set and get path
> change notifications instead.
> 
That is what I'm eventually planning to do.
My final goal is to move the multipath path monitoring stuff
into the kernel (via the block device polling mechanism), and sending
block events for path failure and re-establishment.

But that'll be subject to further patches :-)

> That being said after spending a lot of time reading this code I think
> my orginal advice to use different work_structs for the different
> kinds of events was wrong, and I'd like to apologize for making you
> go down that direction.
> 
> STPG/RTPG is just too dependent to sensibly split them up, and the
> qdata bit while broken can be shoe horned in for now.  So I'd suggest
> to revert back to the original model with a single work_struct per
> port group, and a global workqueue.
> 
Weelll ... there was a reason why I did that initially :-)
But anyway, no harm done. I still have the original series around.

>> -	if (h->pg)
>> +	if (h->pg) {
>> +		synchronize_rcu();
>>  		return SCSI_DH_OK;
>> +	}
> 
> What is this synchronize_rcu supposed to help with?
> 
>> -		h->pg = tmp_pg;
>>  		kref_get(&tmp_pg->kref);
>> +		spin_lock(&h->pg_lock);
>> +		rcu_assign_pointer(h->pg, tmp_pg);
>> +		spin_unlock(&h->pg_lock);
> 
> I think the assignment to h->ph should have been past the kref_get
> from the very beginning, please fold this into the original patch.
> 
Okay.

>> +	struct alua_port_group *pg = NULL;
>> +	int error;
>> +
>> +	mutex_lock(&h->init_mutex);
>> +	error = alua_check_tpgs(sdev, h);
>> +	if (error == SCSI_DH_OK) {
>> +		error = alua_check_vpd(sdev, h);
>> +		rcu_read_lock();
>> +		pg = rcu_dereference(h->pg);
>> +		if (!pg) {
>> +			rcu_read_unlock();
>> +			h->tpgs = TPGS_MODE_NONE;
>> +			error = SCSI_DH_DEV_UNSUPP;
>> +		} else {
>> +			WARN_ON(error != SCSI_DH_OK);
>> +			kref_get(&pg->kref);
>> +			rcu_read_unlock();
>> +		}
>> +	}
> 
> Eww. Please grab the reference to the pg inside the replacement for
> alua_check_vpd, and ensure that we never return from that without a
> valid h->pg.  Together with the previously suggested removal of h->tpgs,
> and moving the call to alua_rtpg_queue into the alua_check_vpd replacement
> that would keep alua_initialize nice and simple.
> 
You are right in that the interface for alua_check_tpgs isn't very
clear. Currently we need to check both the return code _and_ 'h->pg',
even though both are interrelated.
I guess it should rather be 'h->pg = alua_check_vpd()' or even making
alua_check_vpd() a void function, which would eliminate the need for the
separate return value.
I'll have to think about it.

>> @@ -679,6 +895,10 @@ static int alua_set_params(struct scsi_device *sdev, const char *params)
>>  	unsigned int optimize = 0, argc;
>>  	const char *p = params;
>>  	int result = SCSI_DH_OK;
>> +	unsigned long flags;
>> +
>> +	if (!h)
>> +		return -ENXIO;
> 
> How could that happen?  Why does it beling into this patch?
> 
Leftover from the original patch, which didn't have your scsi_dh
patchset. I'll remove it.

>> @@ -723,24 +944,46 @@ static int alua_activate(struct scsi_device *sdev,
>>  {
>>  	struct alua_dh_data *h = sdev->handler_data;
>>  	int err = SCSI_DH_OK;
>> +	struct alua_queue_data *qdata;
>> +	struct alua_port_group *pg;
>>  
>> -	if (!h->pg)
>> +	if (!h) {
>> +		err = SCSI_DH_NOSYS;
>>  		goto out;
>> +	}
> 
> Same here.
> 
>> +	qdata = kzalloc(sizeof(*qdata), GFP_KERNEL);
>> +	if (!qdata) {
>> +		err = SCSI_DH_RES_TEMP_UNAVAIL;
>> +		goto out;
>> +	}
>> +	qdata->callback_fn = fn;
>> +	qdata->callback_data = data;
>>  
>> +	mutex_lock(&h->init_mutex);
>> +	rcu_read_lock();
>> +	pg = rcu_dereference(h->pg);
>> +	if (!pg) {
>> +		rcu_read_unlock();
>> +		kfree(qdata);
>> +		err = h->init_error;
>> +		mutex_unlock(&h->init_mutex);
>>  		goto out;
>>  	}
>> +	mutex_unlock(&h->init_mutex);
>> +	fn = NULL;
>> +	kref_get(&pg->kref);
>> +	rcu_read_unlock();
>> +
>> +	if (optimize_stpg) {
>> +		unsigned long flags;
>> +
>> +		spin_lock_irqsave(&pg->rtpg_lock, flags);
>> +		pg->flags |= ALUA_OPTIMIZE_STPG;
>> +		spin_unlock_irqrestore(&pg->rtpg_lock, flags);
>> +	}
> 
> The optimize_stpg should have been moved towards the alua_port_group
> allocation earlier in the series, please fold that in there.
> 
Okay.

>> +	alua_rtpg_queue(pg, sdev, qdata);
>> +	kref_put(&pg->kref, release_port_group);
> 
> Note that all alua_rtpg_queue callers already have a reference to the
> PG, so I don't think we need to grab another one inside it or even
> check for a NULL pg.
> 
In general I try to follow the rule of getting an initial reference, and
another one whenever the port_group structure is put on the workqueue.
But I'll check here if the additional reference is warranted.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@xxxxxxx			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html