On Tue, Jun 11, 2024 at 09:56:17AM -0700, Linus Torvalds wrote: > On Tue, 11 Jun 2024 at 07:29, Mark Rutland <mark.rutland@xxxxxxx> wrote: > > > > Do we expect to use this more widely? If this only really matters for > > d_hash() it might be better to handle this via the alternatives > > framework with callbacks and avoid the need for new infrastructure. > > Hmm. The notion of a callback for alternatives is intriguing and would > be very generic, but we don't have anything like that right now. > > Is anybody willing to implement something like that? Because while I > like the idea, it sounds like a much bigger change. Fair enough if that's a pain on x86, but we already have them on arm64, and hence using them is a smaller change there. We already have a couple of cases which uses MOVZ;MOVK;MOVK;MOVK sequence, e.g. // in __invalidate_icache_max_range() asm volatile(ALTERNATIVE_CB("movz %0, #0\n" "movk %0, #0, lsl #16\n" "movk %0, #0, lsl #32\n" "movk %0, #0, lsl #48\n", ARM64_ALWAYS_SYSTEM, kvm_compute_final_ctr_el0) : "=r" (ctr)); ... which is patched via the callback: void kvm_compute_final_ctr_el0(struct alt_instr *alt, __le32 *origptr, __le32 *updptr, int nr_inst) { generate_mov_q(read_sanitised_ftr_reg(SYS_CTR_EL0), origptr, updptr, nr_inst); } ... where the generate_mov_q() helper does the actual instruction generation. So if we only care about a few specific constants, we could give them their own callbacks, like kvm_compute_final_ctr_el0() above. [...] > > We have some helpers for instruction manipulation, and we can use > > aarch64_insn_encode_immediate() here, e.g. > > > > #include <asm/insn.h> > > > > static inline void __runtime_fixup_16(__le32 *p, unsigned int val) > > { > > u32 insn = le32_to_cpu(*p); > > insn = aarch64_insn_encode_immediate(AARCH64_INSN_IMM_16, insn, val); > > *p = cpu_to_le32(insn); > > } > > Ugh. I did that, and then noticed that it makes the generated code > about ten times bigger. > > That interface looks positively broken. > > There is absolutely nobody who actually wants a dynamic argument, so > it would have made both the callers and the implementation *much* > simpler had the "AARCH64_INSN_IMM_16" been encoded in the function > name the way I did it for my instruction rewriting. > > It would have made the use of it simpler, it would have avoided all > the "switch (type)" garbage, and it would have made it all generate > much better code. Oh, completely agreed. FWIW, I have better versions sat in my arm64/insn/rework branch, but I haven't had the time to get all the rest of the insn framework cleanup sorted: https://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git/commit/?h=arm64/insn/rework&id=9cf0ec088c9d5324c60933bf3924176fea0a4d0b I can go prioritise getting that bit out if it'd help, or I can clean this up later. Those allow the compiler to do much better, including compile-time (or runtime) checks that immediates fit. For example: void encode_imm16(__le32 *p, u16 imm) { u32 insn = le32_to_cpu(*p); // Would warn if 'imm' were u32. // As u16 always fits, no warning BUILD_BUG_ON(!aarch64_insn_try_encode_unsigned_imm16(&insn, imm)); *p = cpu_to_le32(insn); } ... compiles to: <encode_imm16>: ldr w2, [x0] bfi w2, w1, #5, #16 str w2, [x0] ret ... which I think is what you want? > So I did that change you suggested, and then undid it again. > > Because that whole aarch64_insn_encode_immediate() thing is an > abomination, and should be burned at the stake. It's misdesigned in > the *worst* possible way. > > And no, this code isn't performance-critical, but I have some taste, > and the code I write will not be using that garbage. Fair enough. Mark.