Hi Boris and Petr, first of all thanks for your great analysis and really sorry for the huge delay in my response. Below I'm pasting the 2 relevant responses from both Petr and Boris. On 22/11/2022 12:06, Borislav Petkov wrote: > On Tue, Nov 22, 2022 at 10:33:12AM -0300, Guilherme G. Piccoli wrote: > > Leaving in the whole thing for newly added people. > >> On 18/09/2022 11:10, Guilherme G. Piccoli wrote: >>> On 19/08/2022 19:17, Guilherme G. Piccoli wrote: >>>> The altera_edac panic notifier performs some data collection with >>>> regards errors detected; such code relies in the regmap layer to >>>> perform reads/writes, so the code is abstracted and there is some >>>> risk level to execute that, since the panic path runs in atomic >>>> context, with interrupts/preemption and secondary CPUs disabled. >>>> >>>> Users want the information collected in this panic notifier though, >>>> so in order to balance the risk/benefit, let's skip the altera panic >>>> notifier if kdump is loaded. While at it, remove a useless header >>>> and encompass a macro inside the sole ifdef block it is used. >>>> >>>> Cc: Borislav Petkov <bp@xxxxxxxxx> >>>> Cc: Petr Mladek <pmladek@xxxxxxxx> >>>> Cc: Tony Luck <tony.luck@xxxxxxxxx> >>>> Acked-by: Dinh Nguyen <dinguyen@xxxxxxxxxx> >>>> Signed-off-by: Guilherme G. Piccoli <gpiccoli@xxxxxxxxxx> >>>> >>>> --- >>>> >>>> V3: >>>> - added the ack tag from Dinh - thanks! >>>> - had a good discussion with Boris about that in V2 [0], >>>> hopefully we can continue and reach a consensus in this V3. >>>> [0] https://lore.kernel.org/lkml/46137c67-25b4-6657-33b7-cffdc7afc0d7@xxxxxxxxxx/ >>>> >>>> V2: >>>> - new patch, based on the discussion in [1]. >>>> [1] https://lore.kernel.org/lkml/62a63fc2-346f-f375-043a-fa21385279df@xxxxxxxxxx/ >>>> >>>> [...] >>> >>> Hi Dinh, Tony, Boris - sorry for the ping. >>> >>> Appreciate reviews on this one - Dinh already ACKed the patch but Boris >>> raised some points in the past version [0], so any opinions or >>> discussions are welcome! >> >> >> Hi folks, monthly ping heheh >> Apologies for the re-pings, please let me know if there is anything >> required to move on this patch. > > Looking at this again, I really don't like the sprinkling of > > if (kexec_crash_loaded()) > > in unrelated code. And I still think that the real fix here is to kill > this > > edac->panic_notifier > > thing. And replace it with simply logging the error from the double bit > error interrupt handle. That DBERR IRQ thing altr_edac_a10_irq_handler(). > Because this is what this panic notifier does - dump double-bit errors. > > Now, if Dinh doesn't move, I guess we can ask Tony and/or Rabara (he has > sent a patch for this driver recently and Altera belongs to Intel now) > to find someone who can test such a change and we (you could give it a > try first :)) can do that change. > > Thx. > On 09/12/2022 13:03, Petr Mladek wrote:> [...]> > I have read the discussion about v2 [1] and this looks like a bad > approach from my POV. > > My understanding is that the information provided by this notifier > could not be found in the crashdump. It means that people really > want to run this before crashdump in principle. > > Of course, there is the question how much safe this code is. I mean > if the panic() code path might get blocked here. > > I see two possibilities. > > The best solution would be if we know that this is "always" safe or if > it can be done a safe way. Then we could keep it as it is or implement > the safe way. > > Alternative solution would be to create a kernel parameter that > would enable/disable this particular report when kdump is enabled. > The question would be the default. It would depend on how risky > the code is and how useful the information is. > > [1] https://lore.kernel.org/r/20220719195325.402745-11-gpiccoli@xxxxxxxxxx > So, for me Petr approach is the more straightforward, though we could rethink the idea of this notifier being...a notifier, as suggest Boris heh Anyway, what I plan to do is: I'll re-submit a simple clean-up for this code (header / ifdef stuff), not functional-changing the code path. After that, when re-submitting the V2 or the notifiers refactor (which I'm pending for some good months =O ), I'll deal with this code properly, factoring the ideas and proposing a meaningful change. So, let's discard this patch for now. Thanks again, Guilherme