> -----Original Message----- > From: Finn Thain [mailto:fthain@xxxxxxxxxxxxxxxxxxx] > Sent: Friday, February 12, 2021 12:58 PM > To: Song Bao Hua (Barry Song) <song.bao.hua@xxxxxxxxxxxxx> > Cc: tanxiaofei <tanxiaofei@xxxxxxxxxx>; jejb@xxxxxxxxxxxxx; > martin.petersen@xxxxxxxxxx; linux-scsi@xxxxxxxxxxxxxxx; > linux-kernel@xxxxxxxxxxxxxxx; linuxarm@xxxxxxxxxxxxx; > linux-m68k@xxxxxxxxxxxxxxx > Subject: RE: [Linuxarm] Re: [PATCH for-next 00/32] spin lock usage optimization > for SCSI drivers > > On Thu, 11 Feb 2021, Song Bao Hua (Barry Song) wrote: > > > > On Wed, 10 Feb 2021, Song Bao Hua (Barry Song) wrote: > > > > > > > > On Wed, 10 Feb 2021, Song Bao Hua (Barry Song) wrote: > > > > > > > > > > > TBH, that is why m68k is so confusing. irqs_disabled() on m68k > > > > > > should just reflect the status of all interrupts have been > > > > > > disabled except NMI. > > > > > > > > > > > > irqs_disabled() should be consistent with the calling of APIs > > > > > > such as local_irq_disable, local_irq_save, spin_lock_irqsave > > > > > > etc. > > > > > > > > > > > > > > > > When irqs_disabled() returns true, we cannot infer that > > > > > arch_local_irq_disable() was called. But I have not yet found > > > > > driver code or core kernel code attempting that inference. > > > > > > > > > > > > > > > > > > > > Isn't arch_irqs_disabled() a status reflection of irq > > > > > > > > disable API? > > > > > > > > > > > > > > > > > > > > > > Why not? > > > > > > > > > > > > If so, arch_irqs_disabled() should mean all interrupts have been > > > > > > masked except NMI as NMI is unmaskable. > > > > > > > > > > > > > > > > Can you support that claim with a reference to core kernel code or > > > > > documentation? (If some arch code agrees with you, that's neither > > > > > here nor there.) > > > > > > > > I think those links I share you have supported this. Just you don't > > > > believe :-) > > > > > > > > > > Your links show that the distinction between fast and slow handlers > > > was removed. Your links don't support your claim that > > > "arch_irqs_disabled() should mean all interrupts have been masked". > > > Where is the code that makes that inference? Where is the > > > documentation that supports your claim? > > > > (1) > > https://lwn.net/Articles/380931/ > > Looking at all these worries, one might well wonder if a system which > > *disabled interrupts for all handlers* would function well at all. So it > > is interesting to note one thing: any system which has the lockdep > > locking checker enabled has been running all handlers that way for some > > years now. Many developers and testers run lockdep-enabled kernels, and > > they are available for some of the more adventurous distributions > > (Rawhide, for example) as well. So we have quite a bit of test coverage > > for this mode of operation already. > > > > IIUC, your claim is that CONFIG_LOCKDEP involves code that contains the > inference, "arch_irqs_disabled() means all interrupts have been masked". > > Unfortunately, m68k lacks CONFIG_LOCKDEP support so I can't easily confirm > this. I suppose there may be other architectures that support both LOCKDEP > and nested interrupts (?) > > Anyway, if you would point to the code that contains said inference, that > would help a lot. > > > (2) > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/ > ?id=b738a50a > > > > "We run all handlers *with interrupts disabled* and expect them not to > > enable them. Warn when we catch one who does." > > > > Again, you don't see that warning because irqs_disabled() correctly > returns true. You can confirm this fact in QEMU or Aranym if you are > interested. > > > (3) > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/ > ?id=e58aa3d2d0cc > > genirq: Run irq handlers *with interrupts disabled* > > > > Running interrupt handlers with interrupts enabled can cause stack > > overflows. That has been observed with multiqueue NICs delivering all > > their interrupts to a single core. We might band aid that somehow by > > checking the interrupt stacks, but the real safe fix is to *run the irq > > handlers with interrupts disabled*. > > > > Again, the stack overflow issue is not applicable. 68000 uses a priority > mask, like ARM GIC. So there's no arbitrary nesting of interrupt handlers. > > In practice stack overflows simply don't occur on m68k. Please do try it. > > > > > All these documents say we are running irq handler with interrupts > > disabled. but it seems you think high-prio interrupts don't belong > > to "interrupts" in those documents :-) > > > > that is why we can't get agreement. I think "interrupts" mean all except > > NMI in these documents, but you insist high-prio IRQ is an exception. > > > > We can't get agreement because you seek to remove functionality without > justification. > > > > > > > > > > > > > > > > > > > > Are all interrupts (including NMI) masked whenever > > > > > > > arch_irqs_disabled() returns true on your platforms? > > > > > > > > > > > > On my platform, once irqs_disabled() is true, all interrupts are > > > > > > masked except NMI. NMI just ignore spin_lock_irqsave or > > > > > > local_irq_disable. > > > > > > > > > > > > On ARM64, we also have high-priority interrupts, but they are > > > > > > running as PESUDO_NMI: > > > > > > https://lwn.net/Articles/755906/ > > > > > > > > > > > > > > > > A glance at the ARM GIC specification suggests that your hardware > > > > > works much like 68000 hardware. > > > > > > > > > > When enabled, a CPU interface takes the highest priority > > > > > pending interrupt for its connected processor and determines > > > > > whether the interrupt has sufficient priority for it to signal > > > > > the interrupt request to the processor. [...] > > > > > > > > > > When the processor acknowledges the interrupt at the CPU > > > > > interface, the Distributor changes the status of the interrupt > > > > > from pending to either active, or active and pending. At this > > > > > point the CPU interface can signal another interrupt to the > > > > > processor, to preempt interrupts that are active on the > > > > > processor. If there is no pending interrupt with sufficient > > > > > priority for signaling to the processor, the interface > > > > > deasserts the interrupt request signal to the processor. > > > > > > > > > > https://developer.arm.com/documentation/ihi0048/b/ > > > > > > > > > > Have you considered that Linux/arm might benefit if it could fully > > > > > exploit hardware features already available, such as the interrupt > > > > > priority masking feature in the GIC in existing arm systems? > > > > > > > > I guess no:-) there are only two levels: IRQ and NMI. Injecting a > > > > high-prio IRQ level between them makes no sense. > > > > > > > > To me, arm64's design is quite clear and has no any confusion. > > > > > > > > > > Are you saying that the ARM64 hardware design is confusing because it > > > implements a priority mask, and that's why you had to simplify it with > > > a pseudo-nmi scheme in software? > > > > No, I was not saying this. I think both m68k and arm64 have good > > hardware design. Just Linux's implementation is running irq-handlers > > with interrupts disabled. So ARM64's pseudo-nmi is adapted to Linux > > better. > > > > So, a platform should do what all the other platforms do because to > deviate would be too dangerous? > > > > > > > > > > > > On m68k, it seems you mean: > > > > > > irq_disabled() is true, but high-priority interrupts can still > > > > > > come; local_irq_disable() can disable high-priority interrupts, > > > > > > and at that time, irq_disabled() is also true. > > > > > > > > > > > > TBH, this is wrong and confusing on m68k. > > > > > > > > > > > > > > > > Like you, I was surprised when I learned about it. But that > > > > > doesn't mean it's wrong. The fact that it works should tell you > > > > > something. > > > > > > > > > > > > > The fact is that m68k lets arch_irq_disabled() return true to > > > > pretend all IRQs are disabled while high-priority IRQ is still open, > > > > thus "pass" all sanitizing check in genirq and kernel core. > > > > > > > > > > The fact is that m68k has arch_irq_disabled() return false when all > > > IRQs are enabled. So there is no bug. > > > > But it has arch_irq_disabled() return true while some interrupts(not > > NMI) are still open. > > > > > > > > > > Things could always be made simpler. But discarding features isn't > > > > > necessarily an improvement. > > > > > > > > This feature could be used by calling local_irq_enable_in_hardirq() > > > > in those IRQ handlers who hope high-priority interrupts to preempt > > > > it for a while. > > > > > > > > > > So, if one handler is sensitive to interrupt latency, all other > > > handlers should be modified? I don't think that's workable. > > > > I think we just enable preempt_rt or force threaded_irq, and then > > improve the priority of the irq thread who is sensitive to latency. No > > need to touch all threads. > > > > I also understand your point, we let one high-prio interrupt preempt low > > priority interrupt, then we don't need to change the whole system. But I > > think Linux prefers the method of threaded_irq or preempt_rt for this > > kind of problems. > > > > So, some interrupt (or exception) processing happens atomically and the > rest is deferred to a different execution context. (Not a new idea.) > > If you introduce latency in the former context you can't win it back in > the latter. Your solution fails because it adds latency to high priority > handlers. > > > > > > > In anycase, what you're describing is a completely different nested > > > interrupt scheme that would defeat the priority level mechanism that > > > the hardware provides us with. > > > > Yes. Indeed. > > > > > > > > > It shouldn't hide somewhere and make confusion. > > > > > > > > > > The problem is hiding so well that no-one has found it! I say it > > > doesn't exist. > > > > Long long ago(before 2.6.38), we had a kernel supporting IRQF_DISABLED > > and nested interrupts were widely supported, but system also ran well in > > most cases. That means nested interrupts don't really matter in most > > cases. That is why m68k is also running well even though it is still > > nesting. > > > > No, m68k runs well because it uses priority masking. It is not because > some cases are untested. > > Your hardware may not have been around for 4 decades but it implements the > same capability because the design is known to work. > > > > > > > > On the other hand, those who care about realtime should use threaded > > > > IRQ and let IRQ threads preempt each other. > > > > > > > > > > Yes. And those threads also have priority levels. > > > > Finn, I am not a m68k guy, would you help check if this could activate a > > warning on m68k. maybe we can discuss this question in genirq maillist from > > this warning if you are happy. Thanks very much. > > > > diff --git a/include/linux/hardirq.h b/include/linux/hardirq.h > > index 7c9d6a2d7e90..b8ca27555c76 100644 > > --- a/include/linux/hardirq.h > > +++ b/include/linux/hardirq.h > > @@ -32,6 +32,7 @@ static __always_inline void rcu_irq_enter_check_tick(void) > > */ > > #define __irq_enter() \ > > do { \ > > + WARN_ONCE(in_hardirq() && irqs_disabled(), "nested > interrupts\n"); \ > > preempt_count_add(HARDIRQ_OFFSET); \ > > lockdep_hardirq_enter(); \ > > account_hardirq_enter(current); \ > > @@ -44,6 +45,7 @@ static __always_inline void rcu_irq_enter_check_tick(void) > > */ > > #define __irq_enter_raw() \ > > do { \ > > + WARN_ONCE(in_hardirq() && irqs_disabled(), "nested > interrupts\n"); \ > > preempt_count_add(HARDIRQ_OFFSET); \ > > lockdep_hardirq_enter(); \ > > } while (0) > > > > If you think that lockdep or some other code somewhere should be protected > in this way, perhaps you can point to that code. Otherwise, your patch > seems to lack any justification. Ok. I am not say this patch is right. I was actually expecting a log to start the discussion in genirq. Anyway, I think this is a very important problem we need to clarify in genirq. If irq handlers are able to run with some high-prio interrupts enabled on some platform like m68k, we need somewhere to document this is a case somewhere. Anyway, I think it is important to clarify this thoroughly in genirq by starting a discussion with related guys. > > > Best Regards > > Barry > > > > Thanks Barry