On Fri, Sep 22, 2023 at 8:23 AM David Laight <David.Laight@xxxxxxxxxx> wrote: > > > It would be nice if they worked the same, but I suspect that vendors > > may rely on the fact that CPER_SEV_FATAL forces a restart/panic as > > part of their system integrity story. > > The file system errors created by a panic (especially an NMI panic) > could easily be more problematic than a failed PCIe data transfer. > Evan a read that returned ~0u - which can be checked for. > > Panicking a system that is converting TDM telephony to RTP for the > 911 emergency service because a PCIe cable/riser connecting one of the > TDM board has become loose doesn't seem ideal. For kernel native AER the default reaction to errors is reset-and-reinit which probably isn't much better for your case. Sounds like you would want a knob to suppress everything except error reporting so you can handle it in userspace? > (Or because the TDM board's fpga has decided it isn't going to respond > to any accesses until the BARs are setup again...) > > The system can carry on with some TDM connections disabled - but that > is ok because they are all duplicated in case a cable gets cuit. Well that's a relief :) Oliver