Re: [PATCH V3, 1/1] PCI/AER: fix use-after-free in pcie_do_fatal_recovery

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jul 26, 2018 at 10:29:18AM -0400, Thomas Tai wrote:
> [ ... ]>
> > I know I suggested this strategy, but I think this ended up being more
> > complicated than it's worth.
> > 
> > The problem code in pcie_do_fatal_recovery() essentially looks like
> > this:
> > 
> >    pcie_do_fatal_recovery(dev)
> >      pci_stop_and_remove_bus_device(dev);
> >      reset_link(dev);
> >      pci_cleanup_aer_uncorrect_error_status(dev);
> >      pcie_wait_for_link(dev, ...);
> >      pci_uevent_ers(dev, ...);
> >      pci_info(dev, ...);
> > 
> > Some of this depends on the device type (bridge vs. endpoint) and the
> > caller (AER vs. DPC), but given the right conditions, we can exercise
> > all the above calls.
> > 
> > I think it is just broken that we keep doing things with "dev" after
> > removing it.  IMHO this code should be restructured to avoid that.
> > 
> > I think fiddling with the refcount as in this patch adds too much
> > complexity and makes it look like the current structure of
> > pcie_do_fatal_recovery() is reasonable when it really isn't.
> > 
> > But restructuring pcie_do_fatal_recovery() is too big a project to do
> > before v4.18, and we need to fix this problem.  I propose that we
> > merge your v2 patch for now, so at least the band-aid is in the
> > function that I think is broken.
> > 
> > I *would* like to reduce the scope of the get/put as in the patch
> > below, though, so it is contained inside the rescan_remove lock.
> > Could you try it and make sure it's still enough to avoid the problem?
> > If it is, I'll add your sign-off and get this in v4.18.
> 
> Hi Bjorn,
> Thank you for your review and the details analysis. Sure, let's do the work
> around for now. I retested your patch below and works fine. You are welcome
> to add my signed-off and get this in v4.18.

OK, I added your signed-off-by and put the patch below on my for-linus
branch for v4.18.

> As far as reworking the pcie_do_fatal_recovery() goes, would you think I can
> help out in any way? May be I can try rework the code to not use the dev
> after it is removed.

That'd be great!  I expect Oza and Keith will have useful insight
there, too, so keep them in the loop.

> > commit 277ce38f2ed6a4310acf3bd541fb3aee4ec27dee
> > Author: Thomas Tai <thomas.tai@xxxxxxxxxx>
> > Date:   Tue Jul 24 16:47:59 2018 -0500
> > 
> >      PCI/AER: Work around use-after-free in pcie_do_fatal_recovery()
> >      When an fatal error is received by a non-bridge device, the device is
> >      removed, and pci_stop_and_remove_bus_device() deallocates the device
> >      structure.  The freed device structure is used by subsequent code to send
> >      uevents and print messages.
> >      Hold a reference on the device until we're finished using it.  This is not
> >      an ideal fix because pcie_do_fatal_recovery() should not use the device at
> >      all after removing it, but that's too big a project for right now.
> >      #
> >      [bhelgaas: changelog, reduce get/put coverage]
> >      Signed-off-by: Bjorn Helgaas <bhelgaas@xxxxxxxxxx>
> > 
> > diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
> > index fdbcc555860d..674984a9277a 100644
> > --- a/drivers/pci/pcie/err.c
> > +++ b/drivers/pci/pcie/err.c
> > @@ -291,6 +291,7 @@ void pcie_do_fatal_recovery(struct pci_dev *dev, u32 service)
> >   	parent = udev->subordinate;
> >   	pci_lock_rescan_remove();
> > +	pci_dev_get(dev);
> >   	list_for_each_entry_safe_reverse(pdev, temp, &parent->devices,
> >   					 bus_list) {
> >   		pci_dev_get(pdev);
> > @@ -325,6 +326,7 @@ void pcie_do_fatal_recovery(struct pci_dev *dev, u32 service)
> >   		pci_info(dev, "Device recovery from fatal error failed\n");
> >   	}
> > +	pci_dev_put(dev);
> >   	pci_unlock_rescan_remove();
> >   }
> > 
> > > Signed-off-by: Thomas Tai <thomas.tai@xxxxxxxxxx>
> > > ---
> > >   drivers/pci/pcie/aer.c | 27 +++++++++++++++++++++++++--
> > >   1 file changed, 25 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> > > index a2e8838..6e5e6a5 100644
> > > --- a/drivers/pci/pcie/aer.c
> > > +++ b/drivers/pci/pcie/aer.c
> > > @@ -657,6 +657,10 @@ void cper_print_aer(struct pci_dev *dev, int aer_severity,
> > >   static int add_error_device(struct aer_err_info *e_info, struct pci_dev *dev)
> > >   {
> > >   	if (e_info->error_dev_num < AER_MAX_MULTI_ERR_DEVICES) {
> > > +		/* increment reference count to keep the dev
> > > +		 * around until remove_source_device()
> > > +		 */
> > > +		pci_dev_get(dev);
> > >   		e_info->dev[e_info->error_dev_num] = dev;
> > >   		e_info->error_dev_num++;
> > >   		return 0;
> > > @@ -665,6 +669,21 @@ static int add_error_device(struct aer_err_info *e_info, struct pci_dev *dev)
> > >   }
> > >   /**
> > > + * remove_source_device -remove error devices from the e_info
> > > + * @e_info: pointer to error info
> > > + */
> > > +static void remove_source_device(struct aer_err_info *e_info)
> > > +{
> > > +	struct pci_dev *dev;
> > > +
> > > +	while (e_info->error_dev_num > 0) {
> > > +		e_info->error_dev_num--;
> > > +		dev = e_info->dev[e_info->error_dev_num];
> > > +		pci_dev_put(dev);
> > > +	}
> > > +}
> > > +
> > > +/**
> > >    * is_error_source - check whether the device is source of reported error
> > >    * @dev: pointer to pci_dev to be checked
> > >    * @e_info: pointer to reported error info
> > > @@ -976,8 +995,10 @@ static void aer_isr_one_error(struct aer_rpc *rpc,
> > >   			e_info->multi_error_valid = 0;
> > >   		aer_print_port_info(pdev, e_info);
> > > -		if (find_source_device(pdev, e_info))
> > > +		if (find_source_device(pdev, e_info)) {
> > >   			aer_process_err_devices(e_info);
> > > +			remove_source_device(e_info);
> > > +		}
> > >   	}
> > >   	if (e_src->status & PCI_ERR_ROOT_UNCOR_RCV) {
> > > @@ -995,8 +1016,10 @@ static void aer_isr_one_error(struct aer_rpc *rpc,
> > >   		aer_print_port_info(pdev, e_info);
> > > -		if (find_source_device(pdev, e_info))
> > > +		if (find_source_device(pdev, e_info)) {
> > >   			aer_process_err_devices(e_info);
> > > +			remove_source_device(e_info);
> > > +		}
> > >   	}
> > >   }
> > > -- 
> > > 1.8.3.1
> > > 



[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux