On 19/05/2020 21:28, Freddie Chopin wrote: > On Tue, 2020-05-19 at 14:52 +0100, Richard Earnshaw wrote: >> Only d7? No, that couldn't be right. d7 would only be used if d0-d6 >> had also been used. > > I've looked at the disassembly again and my first description of > symptoms was indeed wrong, well - partially (; I've tried looking at a > bigger picture now and it seems that the parameters are not passed via > FPU registers, but FPU registers are used as intermediate helper > registers in a few places ("vldr" appears in the listing 24 times, this > application does not use any floating point types or functions). > > The most common pattern is something like this: > > -- >8 -- >8 -- >8 -- >8 -- >8 -- >8 -- >8 -- >8 -- >8 -- > > 080112e0 <distortos::SoftwareTimerCommon::start(std::chrono::time_point<distortos::TickClock, std::chrono::duration<long long, std::ratio<1ll, 1000ll> > >, std::chrono::duration<long long, std::ratio<1ll, 1000ll> >)>: > { > 80112e0: b500 push {lr} > 80112e2: b083 sub sp, #12 > 80112e4: ed9d 7b04 vldr d7, [sp, #16] > softwareTimerControlBlock_.start(internal::getScheduler().getSoftwareTimerSupervisor(), timePoint, period); > 80112e8: 4904 ldr r1, [pc, #16] ; (80112fc <distortos::SoftwareTimerCommon::start(std::chrono::time_point<distortos::TickClock, std::chrono::duration<long long, std::ratio<1ll, 1000ll> > >, std::chrono::duration<long long, std::ratio<1ll, 1000ll> >)+0x1c>) > 80112ea: ed8d 7b00 vstr d7, [sp] > 80112ee: 3008 adds r0, #8 > 80112f0: f000 f840 bl 8011374 <distortos::internal::SoftwareTimerControlBlock::start(distortos::internal::SoftwareTimerSupervisor&, std::chrono::time_point<distortos::TickClock, std::chrono::duration<long long, std::ratio<1ll, 1000ll> > >, std::chrono::duration<long long, std::ratio<1ll, 1000ll> >)> > } > 80112f4: 2000 movs r0, #0 > 80112f6: b003 add sp, #12 > 80112f8: f85d fb04 ldr.w pc, [sp], #4 > 80112fc: 20000a3c .word 0x20000a3c > > -- >8 -- >8 -- >8 -- >8 -- >8 -- >8 -- >8 -- >8 -- >8 -- > > So it's a vldr followed by vstr (sometimes more than one), it seems > like a way to load 64-bit values in one step. Such pattern appears in > the code several times, it uses mostly d7, but sometimes d8 or d6 (some > parts use two registers in the same block of code, d6 and d7). > > A few times compiler uses s16 as a scratch register like this: > > -- >8 -- >8 -- >8 -- >8 -- >8 -- >8 -- >8 -- >8 -- >8 -- > > 80060fe: f812 3b01 ldrb.w r3, [r2], #1 > 8006102: 9204 str r2, [sp, #16] > 8006104: ee08 3a10 vmov s16, r3 > const auto rawQueueWrapper = makeRawQueueWrapper<0>(dynamic, fifo); > 8006108: f816 2b01 ldrb.w r2, [r6], #1 > 800610c: ee18 1a10 vmov r1, s16 > 8006110: a809 add r0, sp, #36 ; 0x24 > 8006112: f7ff ff8b bl 800602c <std::unique_ptr<distortos::test::RawQueueWrapper, std::default_delete<distortos::test::RawQueueWrapper> > > > -- >8 -- >8 -- >8 -- >8 -- >8 -- >8 -- >8 -- >8 -- >8 -- > > In this case it seems to make no sense at all, why not just move from > r3 to r1 and be done with that (s16 is not used again in this > function), or why not load into r1 directly? > > Sorry for the initial confusion, I hope that this time I'm more precise > (; > >> No, those changes are for handling of 64-bit integral values where we >> no-longer use Neon to perform those options and have improved the way >> code is generated to handle them using the GP registers. > > I see. I'm just looking for the answer to my basic question - is this a > bug or a feature? If it's a feature, then maybe there's a way to > disable it somehow. > >> Testcase needed. > > I could try providing one if you really think that what I see here is a > bug, not an expected behaviour. > > Regards, > FCh > OK, so not a bug then. Phew! TLDR; I was playing with some changes to try to handle 64-bit copies more efficiently last year, but it was a major can of worms and untangling it proved infeasible before the end of the stage-1 development window. The issue here is that the Arm architecture is not sufficiently orthogonal in its memory addressing modes and GCC likes to think that all architectures fundamentally are orthogonal in this behaviour. It really sucks. Add to that the compiler has a tendency to pun modes when it thinks it might not matter and you end up with a mess that's quite hard to untangle in a reasonable way. I might have a look at those patches again this year, but it might depend on what else I have on my plate during the development window. R.