On Mon, 2017-12-18 at 22:50 -0600, Bjorn Helgaas wrote: > [+cc Keith, Gabriele, Dongdong] > > On Mon, Dec 18, 2017 at 04:38:03PM -0600, Bryant G. Ly wrote: > > Devices can go offline when EEH is reported. This patch adds > > a change to the kernel object and lets udev know of error. > > When device resumes a change is also set reporting device as > > online. Therefore, EEH events are better propagated to user > > space for devices in powerpc arch. > > I'm on vacation and can't review this in detail, but I wonder if you > can compare this with the uevents we emit for DPC, AER, and hotplug > events (if any). I hope we don't end up with userspace having to be > aware of the differences between EEH, DPC, AER, etc. > > From a very quick look, I only see a few uevents even mentioned in > drivers/pci: KOBJ_ADD in __pci_hp_register() and KOBJ_CHANGE in the > SR-IOV code. I'm worried that we're missing some important uevents > in > the PCI core. That's not an argument against what you're doing here; > it just would be nice to fill in any missing pieces in the core also, > and hopefully make them consistent with these EEH events. I don't think this needs to be particularly complex, could we get away with events for when devices do the following? - begin recovery - successfully recover - fail recovery It might be worthwhile sorting out some consistent, non-EEH-specific naming, and then other device error recovery systems can do the same later. - Russell > > > Signed-off-by: Bryant G. Ly <bryantly@xxxxxxxxxxxxxxxxxx> > > Signed-off-by: Juan J. Alvarez <jjalvare@xxxxxxxxxxxxxxxxxx>