> So I submit the following for your consideration. > In the FP emulator, when we set up the trampoline, we set > up the following above the user stack: > > Instruction-to-be-executed > AdELOAD (unaligned load of r0 off of r0) > Magic-cookie-that-is-an-unimplemented-instruction > EPC-to-use-on-completion > > Rather than use thread.desmul_epc as a flag to indicate > to the unaligned access handler that there is emulation > going on, the unaligned access handler would test to see > if the unaligned access instruction itself was the AdELOAD, > and that it is followed by the magic cookie, and if so, treat > that as an indication to stuff the value following the sequence > (as indicated by EPC, modulo branch delays) into the EPC > and return. This way, one could nest an arbitrary number of > emulation/signal/emulation sequences, and they should unroll > correctly. The further magic cookie is needed since, even though > it's a completely useless instruction, the AdELOAD could be > encountered for other reasons, as a misguided attempt > at cache prefetch, or by executing crashme. It's a useless instruction > to emulate, having r0 as the destination, but the normal Linux semantics > as I understand them would be to turn it into a very expensive > no-op and continue in what might otherwise be safe and sane > execution. With the addtional code word after the AdELOAD > (and before the EPC) on the dsemul stack that would be (more-or-less) > guaranteed to really be an illegal instruction, the only risk we would > be taking would be to have a program really containing the > unaligned nonsense load followed by the illegal instruction > blow up, not on the illegal instruction, but by picking up a bogus > EPC. Not perfect. But maybe the lesser of several evils. One further embellishment and one further alternative to consider: The thread data field currently used to store the post-trampoline EPC value can be renamed and used instead as a counter of the "depth" of FP branch delay slot emulation. If the count is zero, the unaligned access trap does not look for the magic sequence described above, and just emulates the AdELOAD instruction. Unfortunately, a signal handler that invokes a longjmp will not return to the trampoline, not invoke the associated trap, and thus not decrement the counter, so a non-zero count cannot be taken as absolute proof that a branch emulation is pending. The hack would further reduce the probability of a "naturally occurring" AdELOAD/MagicCookie sequence being mishandled, not eliminate the possibility. The scheme proposed above allows arbitrary recursion (a Good Thing) but also misdiagnosis of a wildly improbable but conceptually possible error condition (a Bad Thing). An alternative - still in conjunction with the proposed gap-between-user-stack-and-signal-stack technique - would be to replace the single EPC storage location in the thread data with, say, three storage locations and an additional variable that serves as a sort of "stack pointer" for the EPCs. In this model, the EPCs are kept off the user stack and no magic cookies are needed. If nesting of FP branch emulation exceeds three, the process gets nailed with a fatal error. This would not allow arbitrary recursion and would increase the static size of the thread data structure (Bad Things), but would, within the constraints of the allowed depth of recursion, create no ambiguous situations (a Good Thing). Any comments or declarations of preference? I guess it boils down to the question of what is more probable, a naturally occurring sequence of AdELOAD/MagicCookie or a 4-way nesting of signals delivered in the branch emulation delay trampoline window. Both strike me as unlikely, but I feel more confident about my estimate of the former than of the later. Sorry to bore 95% of you to tears with this stuff, but it really does matter to some of us, and I don't presume to know exactly who on the list could have valuable input. Regards, Kevin K.