On Sun, 16 Dec 2018, Andy Lutomirski wrote: > > I think it suffices to emulate what compilers generate in delay slots, > > which should be fairly minimal and stable. At the very least we could > > enumerate everything GCC and LLVM already emit there, and get them to > > upstream a policy of not adding new insns as fpu-delay-slot-allowed. > > If someone is writing asm by hand to do ridiculous things in the delay > > slot with random ISA extensions, they shouldn't expect it to work. > > > > I feel like I have to ask: the real thing preventing emulation is that > new nonstandard instructions might get used in FPU delay slots on > non-FPU-supporting hardware? This seems utterly nuts. If you're > using custom ISA extensions, why on Earth are you also using emulated > floating point instructions? You're targetting a specific known CPU > if you do this, so you should use only instructions that actually work > on that CPU. The FPU is a part of the MIPS/Linux psABI and as far as CPU hardware is concerned it is typically an RTL option for the customer to control when synthesising hardware, just like say the sizes of the caches. IOW you'll have some hardware with FPU and some without that is otherwise identical, and maintaining two sets of binaries for what is a part of the psABI anyway is often seen as not technically or commercially justified. E.g. the (somewhat dated now) 24KEf and 24KEc are complementing standard MIPS32r2+DSP processor cores with and without the FPU respectively. Of course you can stick to the soft-float ABI instead, but then you'll be wasting the FPU resource on FPU cores, so using the hard-float ABI and having instructions emulated on non-FPU cores is usually considered a good compromise. Of course the FPU emulator should have been left to the userland rather than put in the kernel, but that mistake was made many years ago and we need to maintain compatibility. Also someone would actually have to implement that userland emulator. FAOD both GCC and GAS will happily schedule delay slots themselves as long as the candidate instruction is recognised as valid in a delay slot, so there's no need for anyone to do anything manually for the less common instructions to end up in a delay slot. They just need to appear right before a branch or a jump for that to happen. I can't speak for LLVM. Maciej