Re: [PATCH 1/1] PCI/AER: prevent pcie_do_fatal_recovery from using device after it is removed

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 2018-08-16 at 13:52 +0530, poza@xxxxxxxxxxxxxx wrote:
> 
> ok lets start with what we have rather than going back, because 
> reverting the changes is not going to solve anything

I'm not saying you should revert the DPC/AER changes, but I want to
revert the *spec* changes, they are wrong and they make EEH now not
match the spec.

IE. Documentation/PCI/pci-error-recovery.txt

> as I mentioned the behavior of some of the functions and DPC (was the 
> same before and now)
> but the good thing happened because of the patches is; there is a common 
> framework defined in err.c
> and DPC and AER both act on similar rules (the rule is what we define 
> understanding of SPEC)

Depends what you call spec. I'm talking about the Linux error recovery
specification.

Let's not muddle it with the PCIe spec itself, or I'll start quoting
Linus about the general usefulness of specs :-)

What we need to do is how we want Linux and drivers to behave for error
recovery, more / less *regardless* of what the PCIe spec says because
HW specs are invariably wrong and HW implementations ignore them more
often than not anyway.

> and all we have to do is discuss and evolve it or change it
> we can catch up on webex, (Sinan is going to be there in Plumber's 
> conference, I might not be able to join there, as we have bring-up 
> coming)

Ok, I'll try to get there. Let's plan at least a BOF or two if not a
microconf.

To setup a webex let's first list who needs to attend and respective
timezones so we can figure out a time. I'm in Australia east coast.

> > > The way DPC used to behave in 2016, is still the same; which involved
> > > removing and re-enumerating the devices.
> > 
> > Which is mostly useless for anything that isn't a network device.
> > 
> > We've been doing EEH for something like 15 to 20 years, so we have a
> > long experience with what it takes to get PCI(e) devices to recover on
> > enterprise systems.
> > 
> > Removing and re-enumerating is one of the very worst thing you can do
> > in that area.
> > 
> > Cheers,
> > Ben.




[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux