Re: [PATCH v3 2/4] cxlflash: Base error recovery support

"Matthew R. Ochs" <mrochs@xxxxxxxxxxxxxxxxxx> · Wed, 5 Aug 2015 17:30:42 -0500

Hi Brian,

Thanks for reviewing. Comments inline below.

-matt

> On Aug 5, 2015, at 11:04 AM, Brian King <brking@xxxxxxxxxxxxxxxxxx> wrote:
> 
> On 08/02/2015 11:33 PM, Matthew R. Ochs wrote:
> 
>> diff --git a/drivers/scsi/cxlflash/common.h b/drivers/scsi/cxlflash/common.h
>> index ba070a5..3d6217a 100644
>> --- a/drivers/scsi/cxlflash/common.h
>> +++ b/drivers/scsi/cxlflash/common.h
>> @@ -76,6 +76,12 @@ enum cxlflash_init_state {
>> 	INIT_STATE_SCSI
>> };
>> 
>> +enum eeh_state {
>> +	EEH_STATE_NONE,
>> +	EEH_STATE_ACTIVE,
>> +	EEH_STATE_FAILED
>> +};
> 
> Can you use pdev->error_state and pci_channel_offline instead of duplicating this
> state information in a private driver definition?

Makes sense, I’ll look into this.

>> 
>> +#ifdef CONFIG_CXL_EEH
>> +	cxl_perst_reloads_same_image(afu, val) 
>> +#endif
> 
> I'd suggest moving this to a .h and defining the function as a noop there if appropriate, something
> like:
> 
> #ifndef CONFIG_CXL_EEH
> #define cxl_perst_reloads_same_image(cfg->cxl_afu, true) do { } while(0)
> #endif

Done.

>> 
>> -	rcr = afu_reset(cfg);
>> -	if (rcr == 0)
>> -		rc = SUCCESS;
>> -	else
>> -		rc = FAILED;
>> +	switch (cfg->eeh_active) {
>> +	case EEH_STATE_NONE:
>> +		cfg->eeh_active = EEH_STATE_FAILED;
> 
> Seems a little strange to be messing with the EEH state machine here when EEH isn't even at play.
> If you can't switch to use the existing EEH state machine in the pdev struct, suggest renaming
> this internal state machine to something more accurate and using the pdev EEH state machine where you can.
> Same goes for the eeh_waitq…

I do agree that this is a bit strange. What we’re doing here is borrowing the framework we
put in place to quiesce user contexts and hold off new threads coming in during an EEH
event. I’ll look into how we can refactor this given that we’re going to move to using the
existing EEH state machine (pdev->error_state) and will no longer be able to toggle state.

>> +	pr_debug("%s: pdev=%p state=%u\n", __func__, pdev, state);
>> +
>> +	switch (state) {
>> +	case pci_channel_io_frozen:
>> +		cfg->eeh_active = EEH_STATE_ACTIVE;
>> +		udelay(100);
>> +
> 
> I think this udelay needs a comment…

This may end up going away. I’ll add a comment if we keep it.

> I'd suggest calling scsi_block_requests here to stop your queuecommand function from being called.
> Note that this won't stop EH commands from being sent, so you will still need to check this
> in queuecommand, although the right thing to do may be to fix scsi_send_eh_cmnd to not call
> queuecommand if the host is blocked.
> 
> You’d then need to call scsi_unblock_requests when EEH in the perm failure and resume cases.

Good suggestion, we’ll look at adding this in.

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html