Re: [RFC PATCH] MIPS: optimise 32-bit do_div() with constant divisor

Ralf Baechle <ralf@xxxxxxxxxxxxxx> · Fri, 7 Nov 2014 12:35:45 +0100

On Fri, Nov 07, 2014 at 02:20:11AM +0000, Måns Rullgård wrote:
> Date:   Fri, 07 Nov 2014 02:20:11 +0000
> From: Måns Rullgård <mans@xxxxxxxxx>
> To: Ralf Baechle <ralf@xxxxxxxxxxxxxx>
> Cc: linux-mips@xxxxxxxxxxxxxx
> Subject: Re: [RFC PATCH] MIPS: optimise 32-bit do_div() with constant
>  divisor
> Content-Type: text/plain; charset=iso-8859-1
> 
> Ralf Baechle <ralf@xxxxxxxxxxxxxx> writes:
> 
> > On Thu, Nov 06, 2014 at 04:23:18PM +0000, Mans Rullgard wrote:
> >
> >> This is an adaptation of the optimised do_div() for ARM by
> >> Nicolas Pitre implementing division by a constant using a
> >> multiplication by the inverse.  Ideally, the compiler would
> >> do this internally as it does for 32-bit operands, but it
> >> doesn't.
> >> 
> >> This version of the code requires an assembler with support
> >> for the DSP ASE syntax since accessing the hi/lo registers
> >> sanely from inline asm is impossible without this.  Building
> >> for a CPU without this extension still works, however.
> >> 
> >> It also does not protect against the WAR hazards for the
> >> hi/lo registers present on CPUs prior to MIPS IV.
> >> 
> >> I have only tested it as far as booting and light use with
> >> the BUG_ON enabled wihtout encountering any issues.
> >> 
> >> The inverse computation code is a straight copy from ARM,
> >> so this should probably be moved to a shared location.
> >
> > Can you explain why you need __div64_fls()?  There's __fls() which on
> > MIPS is carefully written to make use of the CLZ rsp. DCLZ instructions
> > where available; the fallback implementation is looking fairly similar
> > to your code.
> 
> The regular __fls() doesn't necessarily evaluate at compile-time, which
> is required here.  The normal __fls() could of course be amended to
> bypass the CLZ instruction for constant arguments.
> 
> > MADD is named MAD on some older CPUs; yet other CPUs don't have it
> > at all.  I take it you tried to make GCC emit the instruction but it
> > doesn't?
> 
> GCC generates MADDU instructions only in the most trivial of cases.  For
> instance, (x >> 32) * (u32)y with 64-bit x and y produces far from
> optimal code.  In fact, looking at it again, I see it is even more
> stupid than I thought, so there needs to be more asm, not less.
> 
> Reading the manuals more carefully, it appears that MADDU is only
> reliably available starting with MIPS32 (btw, this information is
> annoyingly hard to find).  Thus this code should be restricted to such
> targets (which probably covers most current users) unless someone feels
> like writing a version for these older CPUs.

I'm primarily concered about not enabling MADD where it's not available.
As for pre-MIPS32 processors - we can do the manual reading about where
to enable MAD(D) later.

As for access to hi/lo, I tried to explicitly put a variable in the lo
register.  Which sort of works for very simple cases but as expected it's
easy to get GCC to spill its RTL guts because it runs out of spill
registers.  It maybe can be made to work but I'd feel nervous about its
stability unless a GCC guru approved this method.

  Ralf