On 12/18/17 10:59 PM, Russell Currey wrote: > On Mon, 2017-12-18 at 22:50 -0600, Bjorn Helgaas wrote: >> [+cc Keith, Gabriele, Dongdong] >> >> On Mon, Dec 18, 2017 at 04:38:03PM -0600, Bryant G. Ly wrote: >>> Devices can go offline when EEH is reported. This patch adds >>> a change to the kernel object and lets udev know of error. >>> When device resumes a change is also set reporting device as >>> online. Therefore, EEH events are better propagated to user >>> space for devices in powerpc arch. >> I'm on vacation and can't review this in detail, but I wonder if you >> can compare this with the uevents we emit for DPC, AER, and hotplug >> events (if any). I hope we don't end up with userspace having to be >> aware of the differences between EEH, DPC, AER, etc. >> >> From a very quick look, I only see a few uevents even mentioned in >> drivers/pci: KOBJ_ADD in __pci_hp_register() and KOBJ_CHANGE in the >> SR-IOV code. I'm worried that we're missing some important uevents >> in >> the PCI core. The only place where I see the KOBJ_REMOVE being used is when the device is removed in pci_destroy_dev -> device_del whic will be called implicitly in permanent failure path of EEH code >> That's not an argument against what you're doing here; >> it just would be nice to fill in any missing pieces in the core also, >> and hopefully make them consistent with these EEH events. > I don't think this needs to be particularly complex, could we get away > with events for when devices do the following? > > - begin recovery > - successfully recover > - fail recovery If there are no objections in the on going review of this patch I can change them to these names: - BEGIN_RECOVERY - SUCCESSFUL_RECOVERY - FAILED_RECOVERY > > It might be worthwhile sorting out some consistent, non-EEH-specific > naming, and then other device error recovery systems can do the same > later. > Do you have a more consistent naming in mind for these events? - Juan