On Tue, 2020-05-19 at 14:52 +0100, Richard Earnshaw wrote: > Only d7? No, that couldn't be right. d7 would only be used if d0-d6 > had also been used. I've looked at the disassembly again and my first description of symptoms was indeed wrong, well - partially (; I've tried looking at a bigger picture now and it seems that the parameters are not passed via FPU registers, but FPU registers are used as intermediate helper registers in a few places ("vldr" appears in the listing 24 times, this application does not use any floating point types or functions). The most common pattern is something like this: -- >8 -- >8 -- >8 -- >8 -- >8 -- >8 -- >8 -- >8 -- >8 -- 080112e0 <distortos::SoftwareTimerCommon::start(std::chrono::time_point<distortos::TickClock, std::chrono::duration<long long, std::ratio<1ll, 1000ll> > >, std::chrono::duration<long long, std::ratio<1ll, 1000ll> >)>: { 80112e0: b500 push {lr} 80112e2: b083 sub sp, #12 80112e4: ed9d 7b04 vldr d7, [sp, #16] softwareTimerControlBlock_.start(internal::getScheduler().getSoftwareTimerSupervisor(), timePoint, period); 80112e8: 4904 ldr r1, [pc, #16] ; (80112fc <distortos::SoftwareTimerCommon::start(std::chrono::time_point<distortos::TickClock, std::chrono::duration<long long, std::ratio<1ll, 1000ll> > >, std::chrono::duration<long long, std::ratio<1ll, 1000ll> >)+0x1c>) 80112ea: ed8d 7b00 vstr d7, [sp] 80112ee: 3008 adds r0, #8 80112f0: f000 f840 bl 8011374 <distortos::internal::SoftwareTimerControlBlock::start(distortos::internal::SoftwareTimerSupervisor&, std::chrono::time_point<distortos::TickClock, std::chrono::duration<long long, std::ratio<1ll, 1000ll> > >, std::chrono::duration<long long, std::ratio<1ll, 1000ll> >)> } 80112f4: 2000 movs r0, #0 80112f6: b003 add sp, #12 80112f8: f85d fb04 ldr.w pc, [sp], #4 80112fc: 20000a3c .word 0x20000a3c -- >8 -- >8 -- >8 -- >8 -- >8 -- >8 -- >8 -- >8 -- >8 -- So it's a vldr followed by vstr (sometimes more than one), it seems like a way to load 64-bit values in one step. Such pattern appears in the code several times, it uses mostly d7, but sometimes d8 or d6 (some parts use two registers in the same block of code, d6 and d7). A few times compiler uses s16 as a scratch register like this: -- >8 -- >8 -- >8 -- >8 -- >8 -- >8 -- >8 -- >8 -- >8 -- 80060fe: f812 3b01 ldrb.w r3, [r2], #1 8006102: 9204 str r2, [sp, #16] 8006104: ee08 3a10 vmov s16, r3 const auto rawQueueWrapper = makeRawQueueWrapper<0>(dynamic, fifo); 8006108: f816 2b01 ldrb.w r2, [r6], #1 800610c: ee18 1a10 vmov r1, s16 8006110: a809 add r0, sp, #36 ; 0x24 8006112: f7ff ff8b bl 800602c <std::unique_ptr<distortos::test::RawQueueWrapper, std::default_delete<distortos::test::RawQueueWrapper> > -- >8 -- >8 -- >8 -- >8 -- >8 -- >8 -- >8 -- >8 -- >8 -- In this case it seems to make no sense at all, why not just move from r3 to r1 and be done with that (s16 is not used again in this function), or why not load into r1 directly? Sorry for the initial confusion, I hope that this time I'm more precise (; > No, those changes are for handling of 64-bit integral values where we > no-longer use Neon to perform those options and have improved the way > code is generated to handle them using the GP registers. I see. I'm just looking for the answer to my basic question - is this a bug or a feature? If it's a feature, then maybe there's a way to disable it somehow. > Testcase needed. I could try providing one if you really think that what I see here is a bug, not an expected behaviour. Regards, FCh