On Fri, Apr 19, 2024 at 06:00:24AM +0200, Thomas Gleixner wrote: > On Mon, Apr 15 2024 at 13:43, Jacob Pan wrote: > > On Mon, 15 Apr 2024 11:53:58 -0700, Jacob Pan <jacob.jun.pan@xxxxxxxxxxxxxxx> wrote: > >> On Thu, 11 Apr 2024 18:51:14 +0200, Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote: > >> > If we really care then we do it proper for _all_ of them. Something like > >> > the uncompiled below. There is certainly a smarter way to do the build > >> > thing, but my kbuild foo is rusty. > >> I too had the concern of the wasting system vectors, but did not know how > >> to fix it. But now your code below works well. Tested without KVM in > >> .config to show the gaps: > >> > >> In VECTOR IRQ domain. > >> > >> BEFORE: > >> System: 46: 0-31,50,235-236,244,246-255 > >> > >> AFTER: > >> System: 46: 0-31,50,241-242,245-255 > >> > >> The only gap is MANAGED_IRQ_SHUTDOWN_VECTOR(243), which is expected on a > >> running system. > >> > >> Verified in irqvectors.s: .ascii "->MANAGED_IRQ_SHUTDOWN_VECTOR $243 > >> > >> POSTED MSI/first system vector moved up from 235 to 241 for this case. > >> > >> Will try to let tools/arch/x86/include/asm/irq_vectors.h also use it > >> instead of manually copy over each time. Any suggestions greatly > >> appreciated. > >> > > On a second thought, if we make system IRQ vector determined at compile > > time based on different CONFIG options, will it break userspace tools such > > as perf? More importantly the rule of not breaking userspace. The rule for tools/perf is "don't impose _any requirement_ on the kernel developers, they don't have to test if any change they do outside of tools/ will break something inside tools/." > tools/arch/x86/include/asm/irq_vectors.h is only used to generate the > list of system vectors for pretty output. And your change already broke > that. Yeah, I even moved that from tools/arch/x86/include/asm/irq_vectors.h to tools/perf/trace/beauty/arch/x86/include/asm/irq_vectors.h (for next merge window). Having it in tools/arch/x86/include/asm/irq_vectors.h was a bad decision as it, as you mentinoned, is only used to generate string tables: ⬢[acme@toolbox perf-tools-next]$ tools/perf/trace/beauty/tracepoints/x86_irq_vectors.sh static const char *x86_irq_vectors[] = { [0x02] = "NMI", [0x80] = "IA32_SYSCALL", [0xec] = "LOCAL_TIMER", [0xed] = "HYPERV_STIMER0", [0xee] = "HYPERV_REENLIGHTENMENT", [0xef] = "MANAGED_IRQ_SHUTDOWN", [0xf0] = "POSTED_INTR_NESTED", [0xf1] = "POSTED_INTR_WAKEUP", [0xf2] = "POSTED_INTR", [0xf3] = "HYPERVISOR_CALLBACK", [0xf4] = "DEFERRED_ERROR", [0xf6] = "IRQ_WORK", [0xf7] = "X86_PLATFORM_IPI", [0xf8] = "REBOOT", [0xf9] = "THRESHOLD_APIC", [0xfa] = "THERMAL_APIC", [0xfb] = "CALL_FUNCTION_SINGLE", [0xfc] = "CALL_FUNCTION", [0xfd] = "RESCHEDULE", [0xfe] = "ERROR_APIC", [0xff] = "SPURIOUS_APIC", }; ⬢[acme@toolbox perf-tools-next]$ Used in: root@number:~# perf trace -a -e irq_vectors:irq_work_entry/max-stack=32/ --max-events=1 0.000 kworker/u57:0-/9912 irq_vectors:irq_work_entry(vector: IRQ_WORK) __sysvec_irq_work ([kernel.kallsyms]) __sysvec_irq_work ([kernel.kallsyms]) sysvec_irq_work ([kernel.kallsyms]) asm_sysvec_irq_work ([kernel.kallsyms]) _raw_spin_unlock_irqrestore ([kernel.kallsyms]) dma_fence_wait_timeout ([kernel.kallsyms]) intel_atomic_commit_tail ([kernel.kallsyms]) process_one_work ([kernel.kallsyms]) worker_thread ([kernel.kallsyms]) kthread ([kernel.kallsyms]) ret_from_fork ([kernel.kallsyms]) ret_from_fork_asm ([kernel.kallsyms]) root@number:~# But as the original cset introducing this explains, these irq_vectors: tracepoins operate on just one of the vectors, so irq_work_entry(vector: IRQ_WORK), irq_vectors:reschedule_exit(vector: RESCHEDULE), etc. > The obvious solution to that is to expose that list in sysfs for > consumption by perf. nah, the best thing these days is stop using 'int' for vector and use 'enum irq_vector', then since we have BTF we can use that to do the enum -> string translation, like with (using /sys/kernel/btf/vmlinux, that is pretty much available everywhere these days): root@number:~# pahole clocksource_ids enum clocksource_ids { CSID_GENERIC = 0, CSID_ARM_ARCH_COUNTER = 1, CSID_MAX = 2, }; root@number:~# pahole skb_drop_reason | head enum skb_drop_reason { SKB_NOT_DROPPED_YET = 0, SKB_CONSUMED = 1, SKB_DROP_REASON_NOT_SPECIFIED = 2, SKB_DROP_REASON_NO_SOCKET = 3, SKB_DROP_REASON_PKT_TOO_SMALL = 4, SKB_DROP_REASON_TCP_CSUM = 5, SKB_DROP_REASON_SOCKET_FILTER = 6, SKB_DROP_REASON_UDP_CSUM = 7, SKB_DROP_REASON_NETFILTER_DROP = 8, root@number:~# Then its easy to go from 0 to CSID_GENERIC, etc. ⬢[acme@toolbox pahole]$ perf stat -e cycles pahole skb_drop_reason > /dev/null Performance counter stats for 'pahole skb_drop_reason': 6,095,427 cpu_atom/cycles:u/ (2.82%) 103,694,633 cpu_core/cycles:u/ (97.18%) 0.039031759 seconds time elapsed 0.016028000 seconds user 0.023007000 seconds sys ⬢[acme@toolbox pahole]$ - Arnaldo > But we don't have to do any of that right away. It's an orthogonal > issue. Just waste the extra system vector to start with and then we can > add the compile time dependend change on top if we really care about > gaining back the vectors. > > Thanks, > > tglx