Christophe Leroy wrote:
ldimm64 is not only used for loading function addresses, and
That's probably true today, but I worry that that can change upstream and we may not notice at all.
the NOPs added for padding are impacting performance, so avoid them when not necessary. On QEMU mac99, with the patch: test_bpf: #829 ALU64_MOV_K: all immediate value magnitudes jited:1 167436810 PASS test_bpf: #831 ALU64_OR_K: all immediate value magnitudes jited:1 170702940 PASS Without the patch: test_bpf: #829 ALU64_MOV_K: all immediate value magnitudes jited:1 173012360 PASS test_bpf: #831 ALU64_OR_K: all immediate value magnitudes jited:1 176424090 PASS That's a 3.5% performance improvement.
A better approach would be to do a full JIT during the extra pass. That's what most other architectures do today. And, as long as we can ensure that the JIT'ed program size can never increase during the extra pass, we should be ok to do a single extra pass.
- Naveen