On Tue, 1 Jul 2014, Will Deacon wrote: > Hi Mans, > > On Tue, Jul 01, 2014 at 06:24:43PM +0100, Måns Rullgård wrote: > > Russell King - ARM Linux <linux@xxxxxxxxxxxxxxxx> writes: > > > As you point out, "bx lr" /may/ be treated specially (I've actually been > > > > Most, if not all, Cortex-A cores do this according the public TRMs. > > They also do the same thing for "mov pc, lr" so there will probably be > > no performance gain from this change. It's still a good idea though, > > since we don't know what future cores will do. > > Funnily enough, that's not actually true (and is more or less what prompted > this patch after discussion with Russell). There are cores out there that > don't predict mov pc, lr at all (let alone do anything with the return > stack). > > > > discussing this with Will Deacon over the last couple of days, who has > > > also been talking to the hardware people in ARM, and Will is happy with > > > this patch as in its current form.) This is why I've changed all > > > "mov pc, reg" instructions which return in some way to use this macro, > > > and left others (those which are used to call some function and return > > > back to the same point) alone. > > > > In that case the patch should be fine. Your patch description didn't > > make it clear that only actual returns were being changed. > > I'm led to believe that some predictors require lr in order to update the > return stack, whilst others don't. That part is all horribly > micro-architectural, so the current patch is doing the right thing by > sticking to the ARM ARM but enabling us to hook into other registers later > on if we choose. May I suggest to have a patch with only the macro definition in it and all this discussion in the commit log please? The usage sites should be done in a separate patch to make it clearer. Nicolas