On 12/12/2021 21.21, Marc Zyngier wrote: >> +/* MPIDR fields */ >> +#define MPIDR_CPU GENMASK(7, 0) >> +#define MPIDR_CLUSTER GENMASK(15, 8) > > This should be defined in terms of MPIDR_AFFINITY_LEVEL() and co. Yeah, I found out about that macro from your PMU driver... :) >> +static const struct aic_info aic1_fipi_info = { >> + .version = 1, >> + >> + .fast_ipi = true, > > Do you anticipate multiple feature flags like this? If so, maybe we > should consider biting the bullet and making this an unsigned long > populated with discrete flags. > > Not something we need to decide now though. Probably not, but who knows! It's easy to change it later, though. >> if (read_sysreg_s(SYS_IMP_APL_IPI_SR_EL1) & IPI_SR_PENDING) { >> - pr_err_ratelimited("Fast IPI fired. Acking.\n"); >> - write_sysreg_s(IPI_SR_PENDING, SYS_IMP_APL_IPI_SR_EL1); >> + if (aic_irqc->info.fast_ipi) { > > On the other hand, this is likely to hit on the fast path. Given that > we know at probe time whether we support SR-based IPIs, we can turn > this into a static key and save a few fetches on every IPI. It applies > everywhere you look at this flag at runtime. Good point, I'll see about refactoring this to use static keys. >> +static void aic_ipi_send_fast(int cpu) >> +{ >> + u64 mpidr = cpu_logical_map(cpu); >> + u64 my_mpidr = cpu_logical_map(smp_processor_id()); > > This is the equivalent of reading MPIDR_EL1. My gut feeling is that it > is a bit faster to access the sysreg than a percpu lookup, a function > call and another memory access. Yeah, I saw other IRQ drivers doing this, but I wasn't sure it made sense over just reading MPIDR_EL1... I'll switch to that. >> + u64 idx = FIELD_GET(MPIDR_CPU, mpidr); >> + >> + if (FIELD_GET(MPIDR_CLUSTER, my_mpidr) == cluster) >> + write_sysreg_s(FIELD_PREP(IPI_RR_CPU, idx), >> + SYS_IMP_APL_IPI_RR_LOCAL_EL1); >> + else >> + write_sysreg_s(FIELD_PREP(IPI_RR_CPU, idx) | FIELD_PREP(IPI_RR_CLUSTER, cluster), >> + SYS_IMP_APL_IPI_RR_GLOBAL_EL1); > > Don't you need an ISB, either here or in the two callers? At the > moment, I don't see what will force the execution of these writes, and > they could be arbitrarily delayed. Is there any requirement for timeliness sending IPIs? They're going to another CPU after all, they could be arbitrarily delayed because it has FIQs masked. >> - if (atomic_read(this_cpu_ptr(&aic_vipi_flag)) & irq_bit) >> - aic_ic_write(ic, AIC_IPI_SEND, AIC_IPI_SEND_CPU(smp_processor_id())); >> + if (atomic_read(this_cpu_ptr(&aic_vipi_flag)) & irq_bit) { >> + if (ic->info.fast_ipi) >> + aic_ipi_send_fast(smp_processor_id()); > > nit: if this is common enough, maybe having an aic_ipi_send_self_fast > could be better. Needs evaluation though. I'll do some printing to see how common self-IPIs are when running common workloads, let's see. If it's common enough it's easy enough to add. >> + irqc->info = *(struct aic_info *)match->data; > > Why the copy? All the data is const, and isn't going away. ... for now, but later patches then start computing register offsets and putting them into this structure :) -- Hector Martin (marcan@xxxxxxxxx) Public Key: https://mrcn.st/pub