On Mon, 30 Oct 2023 07:01:48 PDT (-0700), nadav.amit@xxxxxxxxx wrote:
On Oct 30, 2023, at 3:30 PM, Alexandre Ghiti <alexghiti@xxxxxxxxxxxx> wrote:
+ on_each_cpu_mask(cmask,
+ __ipi_flush_tlb_range_asid,
+ &ftd, 1);
Unrelated, but having fed
Do you mean `ftd`?
If so I'm not all that convinced that's a problem: sure it's 4x`long`,
so we pass it on the stack instead of registers, but otherwise we'd need
another `on_each_cpu_mask()` callback to shim stuff through via
registers.
on the stack might cause it to be unaligned to
the cacheline, which in x86 we have seen introduces some overhead.
We have 128-bit stack alignment on RISC-V, so the elements are at least
aligned. Since they're just being loaded up as scalars for the next
function call I'm not sure the alignment is all that exciting here.
Actually, it is best not to put it on the stack, if possible to reduce
cache traffic.
Sorry if I'm just missing something, but I'm not convinced this is a
measurable performance problem.