On Sun, Jan 22, 2023, at 08:27, Christoph Hellwig wrote: > On Sat, Jan 21, 2023 at 08:30:23PM +0100, Arnd Bergmann wrote: >> I was thinking of using STATIC_CALL() as an optimization here, which >> I find easier to read and understand than alternatives. One advantage >> here is that this allows the actual cache operations to be declared >> locally in the architecture without letting drivers call them, >> but still update the common code to work without indirect branches. >> >> The main downside is that this is currently only optimized on >> powerpc and x86, both of which don't actually need CPU specific >> callbacks. ARC, ARM, and MIPS on the other hand already >> have indirect function pointers, RISC-V would likely benefit the >> most from either alternatives or static_call, as it already >> uses alternatives and has one implementation that is clearly >> preferred over the others. > > For now I'd just keep doing direct calls into the arch code, just > for the lower level invalidate, writeback, invalidate+writeback > calls as that helps cementinc the logic of which of those to use > in well documented core code. Ok. > And I'm not really sure I'd like to go beyond that - making it too > easy pluggable will make people feel more comfortable doing stupid > things here. I fear the bigger risk is still making the functions callable from device driver code than it is to make the functions globally settable. You introduced the mips version in f8c55dc6e828 ("MIPS: use generic dma noncoherent ops for simple noncoherent platforms"), which was clearly meant as an implementation detail, yet we already have a driver that slipped in with 3bdffa8ffb45 ("Input: Add N64 controller driver") that just calls this directly rather than using the dma-mapping interface. On the other hand, the indirect function pointers for per-cpu cache operations are not easily translated anyway: with the three architectures that multiplex between cpu specific operations, arc uses physical addresses, mips uses virtual addresses (because of highmem), and arm even uses both because of incompatible requirements between l1 and l2 cache operations. arm32 also seems to have the superset of all possible corner cases that one might see elsewhere (prefetching vs in-order, writethrough vs writeback, broken broadcast invalidation, ...). > And yes, maybe that's personal because I've warned > the RISC-V people years ago that they'll need architectural > cache management instructions yesterday and the answer was that > no one is going to use them on modern CPUs. *sigh* To be fair, from the ISA point of view, it really shouldn't be necessary as long as you have a sane SoC design. In practice there are always chips that are cutting corners, or use the new CPU core as a drop-in for an existing design. Arm SBSA tried to enforce the same thing and also failed for pretty much the same reason. Arnd