RE: [Linuxarm] Re: [PATCH for-next 00/32] spin lock usage optimization for SCSI drivers

Finn Thain <fthain@xxxxxxxxxxxxxxxxxxx> · Fri, 12 Feb 2021 10:58:04 +1100 (AEDT)

On Thu, 11 Feb 2021, Song Bao Hua (Barry Song) wrote:

On Wed, 10 Feb 2021, Song Bao Hua (Barry Song) wrote:

On Wed, 10 Feb 2021, Song Bao Hua (Barry Song) wrote:

TBH, that is why m68k is so confusing. irqs_disabled() on m68k 
should just reflect the status of all interrupts have been 
disabled except NMI.

irqs_disabled() should be consistent with the calling of APIs 
such as local_irq_disable, local_irq_save, spin_lock_irqsave 
etc.


When irqs_disabled() returns true, we cannot infer that 
arch_local_irq_disable() was called. But I have not yet found 
driver code or core kernel code attempting that inference.


Isn't arch_irqs_disabled() a status reflection of irq 
disable API?


Why not?

If so, arch_irqs_disabled() should mean all interrupts have been 
masked except NMI as NMI is unmaskable.


Can you support that claim with a reference to core kernel code or 
documentation? (If some arch code agrees with you, that's neither 
here nor there.)

I think those links I share you have supported this. Just you don't 
believe :-)


Your links show that the distinction between fast and slow handlers 
was removed. Your links don't support your claim that 
"arch_irqs_disabled() should mean all interrupts have been masked". 
Where is the code that makes that inference? Where is the 
documentation that supports your claim?

(1)
https://lwn.net/Articles/380931/
Looking at all these worries, one might well wonder if a system which 
*disabled interrupts for all handlers* would function well at all. So it 
is interesting to note one thing: any system which has the lockdep 
locking checker enabled has been running all handlers that way for some 
years now. Many developers and testers run lockdep-enabled kernels, and 
they are available for some of the more adventurous distributions 
(Rawhide, for example) as well. So we have quite a bit of test coverage 
for this mode of operation already.


IIUC, your claim is that CONFIG_LOCKDEP involves code that contains the 
inference, "arch_irqs_disabled() means all interrupts have been masked".

Unfortunately, m68k lacks CONFIG_LOCKDEP support so I can't easily confirm 
this. I suppose there may be other architectures that support both LOCKDEP 
and nested interrupts (?)

Anyway, if you would point to the code that contains said inference, that 
would help a lot.

(2)
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b738a50a

"We run all handlers *with interrupts disabled* and expect them not to
enable them. Warn when we catch one who does."


Again, you don't see that warning because irqs_disabled() correctly 
returns true. You can confirm this fact in QEMU or Aranym if you are 
interested.

(3) 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e58aa3d2d0cc
genirq: Run irq handlers *with interrupts disabled*

Running interrupt handlers with interrupts enabled can cause stack 
overflows. That has been observed with multiqueue NICs delivering all 
their interrupts to a single core. We might band aid that somehow by 
checking the interrupt stacks, but the real safe fix is to *run the irq 
handlers with interrupts disabled*.


Again, the stack overflow issue is not applicable. 68000 uses a priority 
mask, like ARM GIC. So there's no arbitrary nesting of interrupt handlers. 

In practice stack overflows simply don't occur on m68k. Please do try it.


All these documents say we are running irq handler with interrupts
disabled. but it seems you think high-prio interrupts don't belong
to "interrupts" in those documents :-)

that is why we can't get agreement. I think "interrupts" mean all except 
NMI in these documents, but you insist high-prio IRQ is an exception.


We can't get agreement because you seek to remove functionality without 
justification.



Are all interrupts (including NMI) masked whenever 
arch_irqs_disabled() returns true on your platforms?

On my platform, once irqs_disabled() is true, all interrupts are 
masked except NMI. NMI just ignore spin_lock_irqsave or 
local_irq_disable.

On ARM64, we also have high-priority interrupts, but they are
running as PESUDO_NMI:
https://lwn.net/Articles/755906/


A glance at the ARM GIC specification suggests that your hardware 
works much like 68000 hardware.

   When enabled, a CPU interface takes the highest priority 
   pending interrupt for its connected processor and determines 
   whether the interrupt has sufficient priority for it to signal 
   the interrupt request to the processor. [...]

   When the processor acknowledges the interrupt at the CPU 
   interface, the Distributor changes the status of the interrupt 
   from pending to either active, or active and pending. At this 
   point the CPU interface can signal another interrupt to the 
   processor, to preempt interrupts that are active on the 
   processor. If there is no pending interrupt with sufficient 
   priority for signaling to the processor, the interface 
   deasserts the interrupt request signal to the processor.

https://developer.arm.com/documentation/ihi0048/b/

Have you considered that Linux/arm might benefit if it could fully 
exploit hardware features already available, such as the interrupt 
priority masking feature in the GIC in existing arm systems?

I guess no:-) there are only two levels: IRQ and NMI. Injecting a 
high-prio IRQ level between them makes no sense.

To me, arm64's design is quite clear and has no any confusion.


Are you saying that the ARM64 hardware design is confusing because it 
implements a priority mask, and that's why you had to simplify it with 
a pseudo-nmi scheme in software?

No, I was not saying this. I think both m68k and arm64 have good 
hardware design. Just Linux's implementation is running irq-handlers 
with interrupts disabled. So ARM64's pseudo-nmi is adapted to Linux 
better.


So, a platform should do what all the other platforms do because to 
deviate would be too dangerous?


On m68k, it seems you mean：
irq_disabled() is true, but high-priority interrupts can still 
come; local_irq_disable() can disable high-priority interrupts, 
and at that time, irq_disabled() is also true.

TBH, this is wrong and confusing on m68k.


Like you, I was surprised when I learned about it. But that 
doesn't mean it's wrong. The fact that it works should tell you 
something.


The fact is that m68k lets arch_irq_disabled() return true to 
pretend all IRQs are disabled while high-priority IRQ is still open, 
thus "pass" all sanitizing check in genirq and kernel core.


The fact is that m68k has arch_irq_disabled() return false when all 
IRQs are enabled. So there is no bug.

But it has arch_irq_disabled() return true while some interrupts(not 
NMI) are still open.


Things could always be made simpler. But discarding features isn't 
necessarily an improvement.

This feature could be used by calling local_irq_enable_in_hardirq() 
in those IRQ handlers who hope high-priority interrupts to preempt 
it for a while.


So, if one handler is sensitive to interrupt latency, all other 
handlers should be modified? I don't think that's workable.

I think we just enable preempt_rt or force threaded_irq, and then 
improve the priority of the irq thread who is sensitive to latency. No 
need to touch all threads.

I also understand your point, we let one high-prio interrupt preempt low 
priority interrupt, then we don't need to change the whole system. But I 
think Linux prefers the method of threaded_irq or preempt_rt for this 
kind of problems.


So, some interrupt (or exception) processing happens atomically and the 
rest is deferred to a different execution context. (Not a new idea.)

If you introduce latency in the former context you can't win it back in 
the latter. Your solution fails because it adds latency to high priority 
handlers.


In anycase, what you're describing is a completely different nested 
interrupt scheme that would defeat the priority level mechanism that 
the hardware provides us with.

Yes. Indeed.


It shouldn't hide somewhere and make confusion.


The problem is hiding so well that no-one has found it! I say it 
doesn't exist.

Long long ago(before 2.6.38), we had a kernel supporting IRQF_DISABLED 
and nested interrupts were widely supported, but system also ran well in 
most cases. That means nested interrupts don't really matter in most 
cases. That is why m68k is also running well even though it is still 
nesting.


No, m68k runs well because it uses priority masking. It is not because 
some cases are untested.

Your hardware may not have been around for 4 decades but it implements the 
same capability because the design is known to work.


On the other hand, those who care about realtime should use threaded 
IRQ and let IRQ threads preempt each other.


Yes. And those threads also have priority levels.

Finn, I am not a m68k guy, would you help check if this could activate a
warning on m68k. maybe we can discuss this question in genirq maillist from
this warning if you are happy. Thanks very much.

diff --git a/include/linux/hardirq.h b/include/linux/hardirq.h
index 7c9d6a2d7e90..b8ca27555c76 100644
--- a/include/linux/hardirq.h
+++ b/include/linux/hardirq.h
@@ -32,6 +32,7 @@ static __always_inline void rcu_irq_enter_check_tick(void)
  */
 #define __irq_enter()                                  \
        do {                                            \
+               WARN_ONCE(in_hardirq() && irqs_disabled(), "nested interrupts\n"); \
                preempt_count_add(HARDIRQ_OFFSET);      \
                lockdep_hardirq_enter();                \
                account_hardirq_enter(current);         \
@@ -44,6 +45,7 @@ static __always_inline void rcu_irq_enter_check_tick(void)
  */
 #define __irq_enter_raw()                              \
        do {                                            \
+               WARN_ONCE(in_hardirq() && irqs_disabled(), "nested interrupts\n"); \
                preempt_count_add(HARDIRQ_OFFSET);      \
                lockdep_hardirq_enter();                \
        } while (0)


If you think that lockdep or some other code somewhere should be protected 
in this way, perhaps you can point to that code. Otherwise, your patch 
seems to lack any justification.

Best Regards
Barry