Stanislav! On Wed, Apr 12 2023 at 09:36, Stanislav Kinsburskii wrote: > On Wed, Apr 12, 2023 at 09:19:51AM -0700, Stanislav Kinsburskii wrote: >> > > + affinity = irq_data_get_effective_affinity_mask(data); >> > > + cpu = cpumask_first_and(affinity, cpu_online_mask); >> > >> > The effective affinity mask of MSI interrupts consists only of online >> > CPUs, to be accurate: it has exactly one online CPU set. >> > >> > But even if it would have only offline CPUs then the result would be: >> > >> > cpu = nr_cpu_ids >> > >> > which is definitely invalid. While a disabled vector targeted to an >> > offline CPU is not necessarily invalid. > > Although this patch only tosses the code and doens't make any functional > changes, I guess if the fix for the used cpu id is required, it has to > be in a separated patch. Sure. > Would you mind to elaborate more of the problem(s)? > Do you mean that the result of cpumask_first_and has to be checked for not > being >= nr_cpus_ids? > Or do you mean that there is no need to check the irq affinity against > cpu_online_mask at all and we can simply take any first bit from the > effective affinity mask? As of today the effective mask of MSI interrupts contains only online CPUs. I don't see a reason for that to change. > Also, could you elaborate more on the disabled vector targeting an > offline CPU? Is there any use case for such scenario (in this case we > might want to support it)? I'm not aware of one today. That was more a theoretical reasoning. > I guess the goal of this code is to make sure that hypervisor won't be > configured to deliver an MSI to an offline CPU. Correct, but if the interrupt _is_ masked at the MSI level then the hypervisor must not deliver an interrupt at all. The point is that it is valid to target a masked MSI entry to an offline CPU under the assumption that the hardware/emulation respects the masking. Whether that's a good idea or not is a different question. The kernel as of today does not do that. It targets unused but configured MSI[-x] entries towards MANAGED_IRQ_SHUTDOWN_VECTOR on CPU0 for various reasons, one of them being paranoia. But in principle there is nothing wrong with that and it should either succeed or being rejected at the software level and not expose a completely invalid CPU number to the hypercall in the first place. So if you want to be defensive, then keep the _and(), but then check the result for being valid and emit something useful like a pr_warn_once() instead of blindly handing the invalid result to the hypercall and then have that reject it with some undecipherable error code. Actually it would not necessarily reach the hypercall because before that it dereferences cpumask_of(nr_cpu_ids) here: nr_bank = cpumask_to_vpset(&(intr_desc->target.vp_set), cpumask_of(cpu)); and explode with a kernel pagefault. If not it will read some random adjacent data and try to create a vp_set from it. Neither of that is anywhere close to correct. Thanks, tglx