On Thu, 2018-08-16 at 13:52 +0530, poza@xxxxxxxxxxxxxx wrote: > > ok lets start with what we have rather than going back, because > reverting the changes is not going to solve anything I'm not saying you should revert the DPC/AER changes, but I want to revert the *spec* changes, they are wrong and they make EEH now not match the spec. IE. Documentation/PCI/pci-error-recovery.txt > as I mentioned the behavior of some of the functions and DPC (was the > same before and now) > but the good thing happened because of the patches is; there is a common > framework defined in err.c > and DPC and AER both act on similar rules (the rule is what we define > understanding of SPEC) Depends what you call spec. I'm talking about the Linux error recovery specification. Let's not muddle it with the PCIe spec itself, or I'll start quoting Linus about the general usefulness of specs :-) What we need to do is how we want Linux and drivers to behave for error recovery, more / less *regardless* of what the PCIe spec says because HW specs are invariably wrong and HW implementations ignore them more often than not anyway. > and all we have to do is discuss and evolve it or change it > we can catch up on webex, (Sinan is going to be there in Plumber's > conference, I might not be able to join there, as we have bring-up > coming) Ok, I'll try to get there. Let's plan at least a BOF or two if not a microconf. To setup a webex let's first list who needs to attend and respective timezones so we can figure out a time. I'm in Australia east coast. > > > The way DPC used to behave in 2016, is still the same; which involved > > > removing and re-enumerating the devices. > > > > Which is mostly useless for anything that isn't a network device. > > > > We've been doing EEH for something like 15 to 20 years, so we have a > > long experience with what it takes to get PCI(e) devices to recover on > > enterprise systems. > > > > Removing and re-enumerating is one of the very worst thing you can do > > in that area. > > > > Cheers, > > Ben.