On Fri, 2021-04-02 at 18:11 +0100, Mark Brown wrote: > On Thu, Mar 11, 2021 at 12:22:36PM +0200, Matti Vaittinen wrote: > > + if (d->fatal_cnt && h->retry_cnt > d->fatal_cnt) { > > + if (d->die) > > + ret = d->die(rid); > > + else > > + BUG(); > > + > > + /* > > + * If the 'last resort' IC recovery failed we will have > > + * nothing else left to do... > > + */ > > + BUG_ON(ret); > > This isn't good... we should be trying to provide more system level > handling of this, if nothing else it's quite possibly not a software > bug > here but rather a hardware failure. An explicit message about what > happened would be more likely to be understood as a hardware failure, I do agree. I'll add a print in next version. > and something which allows handling such as initiating a system > shutdown > would be good as well - I'm not sure if there's any existing > mechanism > to plumb userspace into, or perhaps some sort of policy configurable > via > sysfs. I like the idea but don't know of such existing mechanism. The input system power-key event is closest that comes to my mind - but I don't think that would be quite right. Additionally, I am unsure what level of user-space functionality can be expected to work? Maybe the severity of configured notifications should be used to decide whether to do in- kernel handling or to alert user-space. Anyways, that is something that requires further pondering - I'd propose improving this later. Best Regards Matti Vaittinen