Re: [PATCH v3 2/4] cxlflash: Base error recovery support

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi, 

> @@ -1857,9 +1884,18 @@ static int cxlflash_eh_device_reset_handler(struct scsi_cmnd *scp)
>  		 get_unaligned_be32(&((u32 *)scp->cmnd)[2]),
>  		 get_unaligned_be32(&((u32 *)scp->cmnd)[3]));
>  
> -	rcr = send_tmf(afu, scp, TMF_LUN_RESET);
> -	if (unlikely(rcr))
> -		rc = FAILED;
> +	switch (cfg->eeh_active) {
> +	case EEH_STATE_NONE:
> +		rcr = send_tmf(afu, scp, TMF_LUN_RESET);
> +		if (unlikely(rcr))
> +			rc = FAILED;
> +		break;
> +	case EEH_STATE_ACTIVE:
> +		wait_event(cfg->eeh_waitq, cfg->eeh_active != EEH_STATE_ACTIVE);
> +		break;
> +	case EEH_STATE_FAILED:
> +		break;
> +	}
>  

Looking at the context here, it looks like rc gets initalised to
SUCCESS. In that case, in the EEH failed case, you'll return SUCCESS.
I'm not particularly clear on the semantics here: does that make sense?

Likewise, in the EEH active state, when you finish the wait_event,
should you check if the EEH state went to NONE or FAILED before you
break?

There's a similar case below in host_reset.

>  	pr_debug("%s: returning rc=%d\n", __func__, rc);
>  	return rc;
> @@ -1889,11 +1925,23 @@ static int cxlflash_eh_host_reset_handler(struct scsi_cmnd *scp)
>  		 get_unaligned_be32(&((u32 *)scp->cmnd)[2]),
>  		 get_unaligned_be32(&((u32 *)scp->cmnd)[3]));
>  
> -	rcr = afu_reset(cfg);
> -	if (rcr == 0)
> -		rc = SUCCESS;
> -	else
> -		rc = FAILED;
> +	switch (cfg->eeh_active) {
> +	case EEH_STATE_NONE:
> +		cfg->eeh_active = EEH_STATE_FAILED;
> +		rcr = afu_reset(cfg);
> +		if (rcr == 0)
> +			rc = SUCCESS;
> +		else
> +			rc = FAILED;
> +		cfg->eeh_active = EEH_STATE_NONE;
> +		wake_up_all(&cfg->eeh_waitq);
> +		break;
> +	case EEH_STATE_ACTIVE:
> +		wait_event(cfg->eeh_waitq, cfg->eeh_active != EEH_STATE_ACTIVE);
> +		break;
> +	case EEH_STATE_FAILED:
> +		break;
> +	}
>  
>  	pr_debug("%s: returning rc=%d\n", __func__, rc);
>  	return rc;
> @@ -2145,6 +2193,11 @@ static void cxlflash_worker_thread(struct work_struct *work)
>  	int port;
>  	ulong lock_flags;
>  
> +	/* Avoid MMIO if the device has failed */
> +
> +	if (cfg->eeh_active == EEH_STATE_FAILED)
> +		return;
> +
Should this check be != EEH_STATE_NONE? Or is this called within the
error recovery process?

>  	spin_lock_irqsave(cfg->host->host_lock, lock_flags);
>  
>  	if (cfg->lr_state == LINK_RESET_REQUIRED) {
> @@ -2226,6 +2279,8 @@ static int cxlflash_probe(struct pci_dev *pdev,
>  
>  	cfg->init_state = INIT_STATE_NONE;
>  	cfg->dev = pdev;
> +
> +	cfg->eeh_active = EEH_STATE_NONE;
>  	cfg->dev_id = (struct pci_device_id *)dev_id;
>  
> 
> @@ -2286,6 +2341,85 @@ out_remove:
>  	goto out;
>  }
>  
> +/**
> + * cxlflash_pci_error_detected() - called when a PCI error is detected
> + * @pdev:	PCI device struct.
> + * @state:	PCI channel state.
> + *
> + * Return: PCI_ERS_RESULT_NEED_RESET or PCI_ERS_RESULT_DISCONNECT
> + */
> +static pci_ers_result_t cxlflash_pci_error_detected(struct pci_dev *pdev,
> +						    pci_channel_state_t state)
> +{
> +	struct cxlflash_cfg *cfg = pci_get_drvdata(pdev);
> +
> +	pr_debug("%s: pdev=%p state=%u\n", __func__, pdev, state);
> +
> +	switch (state) {
> +	case pci_channel_io_frozen:
> +		cfg->eeh_active = EEH_STATE_ACTIVE;
> +		udelay(100);
> +
> +		term_mc(cfg, UNDO_START);
> +		stop_afu(cfg);
> +
> +		return PCI_ERS_RESULT_CAN_RECOVER;

I think that should PCI_ERS_RESULT_NEED_RESET.

Apart from that, it's looking pretty good from an EEH perspective.

-- 
Regards,
Daniel

Attachment: signature.asc
Description: This is a digitally signed message part


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]
  Powered by Linux