On Thu, 2018-08-16 at 13:35 +0530, poza@xxxxxxxxxxxxxx wrote: > > > > Bjorn, we are the main authors of that spec (Linas wrote it under my > > supervision) and created those callbacks for EEH. AER picked them up > > only later. Those changes must be at the very least acked by us before > > going upstream. > > > > Ben. > > > + Sinan > > This patch set was there in mailing list for nearly 17 to 18 revisions > for 7 months. Right and sadly the guy doing EEH on our side left and I didn't notice what was going on in the list. But Bjorn should know better :-) > besides the intent was to bring DPC and AER into the same well defined > way of error handling. That's a good idea, but we need to fix DPC and AER understanding of the intent of those callbacks, not change the spec to match the broken implementation. > The way DPC used to behave in 2016, is still the same; which involved > removing and re-enumerating the devices. Which is mostly useless for anything that isn't a network device. We've been doing EEH for something like 15 to 20 years, so we have a long experience with what it takes to get PCI(e) devices to recover on enterprise systems. Removing and re-enumerating is one of the very worst thing you can do in that area. Cheers, Ben.