Re: [PATCH 5/6] qla2xxx: Prevent removal and board_disable race

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 25 Jul 2014 11:23:02 -0400
Chad Dupuis <chad.dupuis@xxxxxxxxxx> wrote:

> 
> 
> On Wed, 18 Jun 2014, Joe Lawrence wrote:
> 
> > Introduce mutual exclusion between the qla2xxx_remove_one PCI driver
> > callback and qla2x00_disable_board_on_pci_error, which is scheduled as
> > board_disable work by qla2x00_check_reg{32,16}_for_disconnect:
> >
> > * Leave the driver-specific data attached to the underlying PCI device
> > intact in qla2x00_disable_board_on_pci_error, so that qla2x00_remove_one
> > has enough breadcrumbs to determine that any board_disable work has been
> > completed.
> >
> > * In qla2xxx_remove_one, set a bit to prevent any subsequent
> > board_disable work from scheduling, then cancel and wait until pending
> > work has completed.
> >
> > * Reuse the PCI device enable count check in qla2x00_remove_one to
> > determine if board_disable has occured.  The original purpose of this
> > check was unnecessary since the driver remove function wasn't called
> > when the probe fails.
> >
> > Signed-off-by: Joe Lawrence <joe.lawrence@xxxxxxxxxxx>
> > ---
> > drivers/scsi/qla2xxx/qla_def.h |    1 +
> > drivers/scsi/qla2xxx/qla_isr.c |    3 ++-
> > drivers/scsi/qla2xxx/qla_os.c  |   31 +++++++++++++++++++------------
> > 3 files changed, 22 insertions(+), 13 deletions(-)
> >
> > diff --git a/drivers/scsi/qla2xxx/qla_def.h b/drivers/scsi/qla2xxx/qla_def.h
> > index 1267b11..7c441c9 100644
> > --- a/drivers/scsi/qla2xxx/qla_def.h
> > +++ b/drivers/scsi/qla2xxx/qla_def.h
> > @@ -3404,6 +3404,7 @@ typedef struct scsi_qla_host {
> >
> > 	unsigned long	pci_flags;
> > #define PFLG_DISCONNECTED	0	/* PCI device removed */
> > +#define PFLG_DRIVER_REMOVING	1	/* PCI driver .remove */
> >
> > 	uint32_t	device_flags;
> > #define SWITCH_FOUND		BIT_0
> > diff --git a/drivers/scsi/qla2xxx/qla_isr.c b/drivers/scsi/qla2xxx/qla_isr.c
> > index 2741ad8..ee5eef4 100644
> > --- a/drivers/scsi/qla2xxx/qla_isr.c
> > +++ b/drivers/scsi/qla2xxx/qla_isr.c
> > @@ -117,7 +117,8 @@ qla2x00_check_reg32_for_disconnect(scsi_qla_host_t *vha, uint32_t reg)
> > {
> > 	/* Check for PCI disconnection */
> > 	if (reg == 0xffffffff) {
> > -		if (!test_and_set_bit(PFLG_DISCONNECTED, &vha->pci_flags)) {
> > +		if (!test_and_set_bit(PFLG_DISCONNECTED, &vha->pci_flags) &&
> > +		    !test_bit(PFLG_DRIVER_REMOVING, &vha->pci_flags)) {
> 
> In our remove function we set a bit that we are unloading:
> 
> set_bit (UNLOADING, &base_vha->dpc_flags);
> 
> could this be used instead?

Hi Chad,

Thanks for the comments.

I think with a little bit of code shuffling this could be accomplished.
My goal with the patchset was to try and present the problem/fix as
plain as possible.  It was easiest to collect all the atomic bits I
needed inside a single variable.  Should I be tacking on such flags to
'dpc_flags' ?

> 
> > 			/*
> > 			 * Schedule this (only once) on the default system
> > 			 * workqueue so that all the adapter workqueues and the
> > diff --git a/drivers/scsi/qla2xxx/qla_os.c b/drivers/scsi/qla2xxx/qla_os.c
> > index 39c9953..51cba37 100644
> > --- a/drivers/scsi/qla2xxx/qla_os.c
> > +++ b/drivers/scsi/qla2xxx/qla_os.c
> > @@ -3123,15 +3123,25 @@ qla2x00_remove_one(struct pci_dev *pdev)
> > 	scsi_qla_host_t *base_vha;
> > 	struct qla_hw_data  *ha;
> >
> > +	base_vha = pci_get_drvdata(pdev);
> > +	ha = base_vha->hw;
> > +
> > +	/* Indicate device removal to prevent future board_disable and wait
> > +	 * until any pending board_disable has completed. */
> > +	set_bit(PFLG_DRIVER_REMOVING, &base_vha->pci_flags);
> > +	cancel_work_sync(&ha->board_disable);
> > +
> > 	/*
> > -	 * If the PCI device is disabled that means that probe failed and any
> > -	 * resources should be have cleaned up on probe exit.
> > +	 * If the PCI device is disabled then there was a PCI-disconnect and
> > +	 * qla2x00_disable_board_on_pci_error has taken care of most of the
> > +	 * resources.
> > 	 */
> > -	if (!atomic_read(&pdev->enable_cnt))
> > +	if (!atomic_read(&pdev->enable_cnt)) {
> 
> Should we also add a check here that this is a disconnection before 
> freeing these structs?  The original intent of the check for 
> pdev->enable_cnt is to make sure we don't try to dereference an already 
> freed struct if probe failed.

I'm not exactly sure what you're asking here.  In my tests, when .probe
return -ERRNO, .remove was not called.  Is there another call path into
qla2x00_remove_one?

The reason I didn't completely cleanup qla_hw_data in
qla2x00_disable_board_on_pci_error and re-purposed the PCI enable count
check, was that I needed some way of determining that any
board_disable work was out of the way before proceeding with
qla2x00_remove_one.

The patch's set_bit / cancel_work_sync above (along with the test_bit
before board_disable schedule) should be ensuring that the
board_disable won't be running or rescheduling in the future.

If qla2x00_disable_board_on_pci_error got as far as actually disabling
the PCI device, then qla2x00_remove_one's check on its enable count
would be verifying that.  If the device is still enabled, then
qla2x00_remove_one knows that board_disable didn't clean up the device.

Would it be clearer if I used an explicit scsi_qla_host flag to
indicate that state?

> 
> > +		scsi_host_put(base_vha->host);
> > +		kfree(ha);
> > +		pci_set_drvdata(pdev, NULL);
> > 		return;
> > -
> > -	base_vha = pci_get_drvdata(pdev);
> > -	ha = base_vha->hw;
> > +	}
> >
> > 	qla2x00_wait_for_hba_ready(base_vha);
> >
> > @@ -4791,18 +4801,15 @@ qla2x00_disable_board_on_pci_error(struct work_struct *work)
> > 	qla82xx_md_free(base_vha);
> > 	qla2x00_free_queues(ha);
> >
> > -	scsi_host_put(base_vha->host);
> > -
> > 	qla2x00_unmap_iobases(ha);
> >
> > 	pci_release_selected_regions(ha->pdev, ha->bars);
> > -	kfree(ha);
> > -	ha = NULL;
> > -
> > 	pci_disable_pcie_error_reporting(pdev);
> > 	pci_disable_device(pdev);
> > -	pci_set_drvdata(pdev, NULL);
> >
> > +	/*
> > +	 * Let qla2x00_remove_one cleanup qla_hw_data on device removal.
> > +	 */
> > }
> >
> > /**************************************************************************
> >

Regards,

-- Joe
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]
  Powered by Linux