Mi Maciej, On Mon, Feb 17, 2025 at 11:23:11PM +0000, Maciej W. Rozycki wrote: > > > Why is this code breaking stack alignment just to have to fix it up two > > > instructions down the line? Or is it that the incoming $sp is not aligned > > > in the first place (in which case we're having a deeper problem). > > > > nolibc itself does not assume that $sp is aligned. > > Maybe Willy can explain the historical background. > > I'm all ears. I had a look, that's interesting. Actually this started in the very early i386 code in nolibc, where we already use the stack in the entry code, and simply fix it up before calling main. Then we pursued this with x86_64 (which also uses the stack and needs to fix it up), then arm (where we use and fix the stack), then the MIPS entry code was simply written based on the same construct while it does not use the stack thus does indeed not need to be fixed. Here are the links, it predates kernel inclusion: arm: https://github.com/wtarreau/nolibc/commit/af968b1 mips: https://github.com/wtarreau/nolibc/commit/e04cd25 Thus we can shave 4 more bytes from the MIPS entry code ;-) > > > > +#endif > > > > + > > > > "jalr $t9\n" /* transfer to c runtime > > > > */ > > > > " nop\n" /* delayed slot > > > > > > On an unrelated matter JALR above ought to be JAL (or otherwise there's > > > no point in using the .cprestore pseudo-op). And I fail to see why this > > > code has to be "noreorder" (except for the .cpload piece, of course), it's > > > just asking for troubles. > > > > Thanks for the hints. > > > > Without "noreorder", is the manually addition of the delayed slot "nop" > > still necessary? > > It's not. It's for the `.set noreorder' mode only, to fill the branch > delay slot by hand. Otherwise it'll become just a useless instruction > following the call sequence and executed after return. Similarly I had ".set noreorder" and the nop in the initial code from 2017. I remember that it took me a long while to get that init code to work on my MIPS boxes, and it's extremely likely that I found the noreorder trick long after I placed the nop and did not remove it. > > These points also apply to the existing O32 implementation, right? > > Correct. Sadly it's the first time I see this code. > > Overall I find it a bit of a chimera: it uses `.set noreorder' and > explicit relocations on one hand, and then high-level assembly `.cpload' > and `.cprestore' pseudo-ops on the other, effectively mixing the two > styles of assembly. > > The pseudo-ops come from times when using assembly macros was the norm > and are there to support that coding model where macros rely on these > pseudo-ops, and before the use of explicit relocations became the norm at > least for GCC. In the absence of assembly macros you can write code > expansions for these pseudo-ops by hand, just as what GCC does nowadays > (in the `-mexplicit-relocs' mode, which is usually the default). > > But due to architecture variations it's very hard to write handcoded > assembly in the `.set noreorder' mode: you need to take care of all the > data dependencies that appear due to the lack of interlocking between > pipeline stages, which results in code that either works correctly > everywhere, but is suboptimal for newer architecture revisions, or code > that works better with newer architecture revisions, but is actually > broken with earlier ones. > > About the only typical case where you do want to use `.set noreorder' is > to schedule a branch delay slot by hand due to a data anti-dependency > between the two instructions. Patchable/self-modifying code is a less > frequent case. And I had literally one case in my entire career where I > wanted to actually jump to a branch delay slot instruction (it's still > there in arch/mips/include/asm/div64.h, in do_div64_32(); see label #2). > > Also it appears no code in this function actually relies on $gp having > been set up, so perhaps this stuff can just be discarded in the first > place? All of this is very helpful. For example we did face an issue with noreorder here, that required to then add ".set push" then ".set pop" as it was polluting the rest of the code: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=184177c3d6e0 But I'm pretty sure that I didn't invent this ".set noreorder" and if I had found how to get rid of it, I'd happily done it. Like often with such code, it was produced by trial and error, and it's very possible that one solution was added to cover the problem caused by another one, and was not optimal. For me by then, the code was considered OK enough when my preinit program would work fine on all my machines (mostly mips-24Kc and mips-1004Kc that I daily have access to). In any case that's something that's easy to try again if we want to clean that up to normalize it. Thanks! Willy