Re: [PATCH 6.10 000/809] 6.10.3-rc3 review

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 8/5/24 01:56, Thomas Gleixner wrote:
On Sun, Aug 04 2024 at 20:28, Guenter Roeck wrote:
On 8/4/24 11:36, Guenter Roeck wrote:
Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx>
      genirq: Set IRQF_COND_ONESHOT in request_irq()


With this patch in v6.10.3, all my parisc64 qemu tests get stuck with repeated error messages

[    0.000000] =============================================================================
[    0.000000] BUG kmem_cache_node (Not tainted): objects 21 > max 16
[    0.000000] -----------------------------------------------------------------------------

Do you have a full boot log? It's unclear to me at which point of the boot
process this happens. Is this before or after the secondary CPUs have
been brought up?

This never stops until the emulation aborts.

Do you have a recipe how to reproduce?

Reverting this patch fixes the problem for me.

I noticed a similar problem in the mainline kernel but it is either spurious there
or the problem has been fixed.


As a follow-up, the patch below (on top of v6.10.3) "fixes" the problem for me.
I guess that suggests some kind of race condition.


@@ -2156,6 +2157,8 @@ int request_threaded_irq(unsigned int irq, irq_handler_t handler,
          struct irq_desc *desc;
          int retval;

+       udelay(1);
+
          if (irq == IRQ_NOTCONNECTED)
                  return -ENOTCONN;

That all makes absolutely no sense to me.


Same here, really. I can reproduce the problem with v6.10.3, using my configuration,
but whatever debugging I add makes the problem disappear. I had seen the same problem
on mainline with v6.11-rc1-272-g17712b7ea075. Log is at
https://kerneltests.org/builders/qemu-parisc64-master/builds/168/steps/qemubuildcommand/logs/stdio
However, I can no longer reproduce it there. What makes it even more weird / odd
is that I can bisect the problem between v6.10.2 and v6.10.3 and it points to this
commit, but reproducing it outside that chain seems to be all but impossible.

Guenter

IRQF_COND_ONESHOT has only an effect on shared interrupts, when the
interrupt was already requested with IRQF_ONESHOT.

If this is really a race then the following must be true:

1) no delay

    CPU0                                 CPU1
    request_irq(IRQF_ONESHOT)
                                         request_irq(IRQF_COND_ONESHOT)

2) delay

    CPU0                                 CPU1
                                         request_irq(IRQF_COND_ONESHOT)
    request_irq(IRQF_ONESHOT)

    In this case the request on CPU 0 fails with -EBUSY ...

Confused

         tglx







[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux