On Sat, Nov 11, 2023 at 08:16:39PM -0800, Jacob Pan wrote: > +static __always_inline inline void handle_pending_pir(struct pi_desc *pid, struct pt_regs *regs) > +{ __always_inline means that... (A) > + int i, vec = FIRST_EXTERNAL_VECTOR; > + u64 pir_copy[4]; > + > + /* > + * Make a copy of PIR which contains IRQ pending bits for vectors, > + * then invoke IRQ handlers for each pending vector. > + * If any new interrupts were posted while we are processing, will > + * do again before allowing new notifications. The idea is to > + * minimize the number of the expensive notifications if IRQs come > + * in a high frequency burst. > + */ > + for (i = 0; i < 4; i++) > + pir_copy[i] = raw_atomic64_xchg((atomic64_t *)&pid->pir_l[i], 0); > + > + /* > + * Ideally, we should start from the high order bits set in the PIR > + * since each bit represents a vector. Higher order bit position means > + * the vector has higher priority. But external vectors are allocated > + * based on availability not priority. > + * > + * EOI is included in the IRQ handlers call to apic_ack_irq, which > + * allows higher priority system interrupt to get in between. > + */ > + for_each_set_bit_from(vec, (unsigned long *)&pir_copy[0], 256) > + call_irq_handler(vec, regs); > + > +} > + > +/* > + * Performance data shows that 3 is good enough to harvest 90+% of the benefit > + * on high IRQ rate workload. > + * Alternatively, could make this tunable, use 3 as default. > + */ > +#define MAX_POSTED_MSI_COALESCING_LOOP 3 > + > +/* > + * For MSIs that are delivered as posted interrupts, the CPU notifications > + * can be coalesced if the MSIs arrive in high frequency bursts. > + */ > +DEFINE_IDTENTRY_SYSVEC(sysvec_posted_msi_notification) > +{ > + struct pt_regs *old_regs = set_irq_regs(regs); > + struct pi_desc *pid; > + int i = 0; > + > + pid = this_cpu_ptr(&posted_interrupt_desc); > + > + inc_irq_stat(posted_msi_notification_count); > + irq_enter(); > + > + while (i++ < MAX_POSTED_MSI_COALESCING_LOOP) { > + handle_pending_pir(pid, regs); > + > + /* > + * If there are new interrupts posted in PIR, do again. If > + * nothing pending, no need to wait for more interrupts. > + */ > + if (is_pir_pending(pid)) So this reads those same 4 words we xchg in handle_pending_pir(), right? > + continue; > + else > + break; > + } > + > + /* > + * Clear outstanding notification bit to allow new IRQ notifications, > + * do this last to maximize the window of interrupt coalescing. > + */ > + pi_clear_on(pid); > + > + /* > + * There could be a race of PI notification and the clearing of ON bit, > + * process PIR bits one last time such that handling the new interrupts > + * are not delayed until the next IRQ. > + */ > + if (unlikely(is_pir_pending(pid))) > + handle_pending_pir(pid, regs); (A) ... we get _two_ copies of that thing in this function. Does that make sense ? > + > + apic_eoi(); > + irq_exit(); > + set_irq_regs(old_regs); > +} > #endif /* X86_POSTED_MSI */ Would it not make more sense to write things something like: bool handle_pending_pir() { bool handled = false; u64 pir_copy[4]; for (i = 0; i < 4; i++) { if (!pid-pir_l[i]) { pir_copy[i] = 0; continue; } pir_copy[i] = arch_xchg(&pir->pir_l[i], 0); handled |= true; } if (!handled) return handled; for_each_set_bit() .... return handled. } sysvec_posted_blah_blah() { bool done = false; bool handled; for (;;) { handled = handle_pending_pir(); if (done) break; if (!handled || ++loops > MAX_LOOPS) { pi_clear_on(pid); /* once more after clear_on */ done = true; } } } Hmm?