Ralf Baechle <ralf@xxxxxxxxxxxxxx> writes: > On Fri, Nov 07, 2014 at 02:20:11AM +0000, Måns Rullgård wrote: >> Date: Fri, 07 Nov 2014 02:20:11 +0000 >> From: Måns Rullgård <mans@xxxxxxxxx> >> To: Ralf Baechle <ralf@xxxxxxxxxxxxxx> >> Cc: linux-mips@xxxxxxxxxxxxxx >> Subject: Re: [RFC PATCH] MIPS: optimise 32-bit do_div() with constant >> divisor >> Content-Type: text/plain; charset=iso-8859-1 >> >> Ralf Baechle <ralf@xxxxxxxxxxxxxx> writes: >> >> > On Thu, Nov 06, 2014 at 04:23:18PM +0000, Mans Rullgard wrote: >> > >> >> This is an adaptation of the optimised do_div() for ARM by >> >> Nicolas Pitre implementing division by a constant using a >> >> multiplication by the inverse. Ideally, the compiler would >> >> do this internally as it does for 32-bit operands, but it >> >> doesn't. >> >> >> >> This version of the code requires an assembler with support >> >> for the DSP ASE syntax since accessing the hi/lo registers >> >> sanely from inline asm is impossible without this. Building >> >> for a CPU without this extension still works, however. >> >> >> >> It also does not protect against the WAR hazards for the >> >> hi/lo registers present on CPUs prior to MIPS IV. >> >> >> >> I have only tested it as far as booting and light use with >> >> the BUG_ON enabled wihtout encountering any issues. >> >> >> >> The inverse computation code is a straight copy from ARM, >> >> so this should probably be moved to a shared location. >> > >> > Can you explain why you need __div64_fls()? There's __fls() which on >> > MIPS is carefully written to make use of the CLZ rsp. DCLZ instructions >> > where available; the fallback implementation is looking fairly similar >> > to your code. >> >> The regular __fls() doesn't necessarily evaluate at compile-time, which >> is required here. The normal __fls() could of course be amended to >> bypass the CLZ instruction for constant arguments. >> >> > MADD is named MAD on some older CPUs; yet other CPUs don't have it >> > at all. I take it you tried to make GCC emit the instruction but it >> > doesn't? >> >> GCC generates MADDU instructions only in the most trivial of cases. For >> instance, (x >> 32) * (u32)y with 64-bit x and y produces far from >> optimal code. In fact, looking at it again, I see it is even more >> stupid than I thought, so there needs to be more asm, not less. >> >> Reading the manuals more carefully, it appears that MADDU is only >> reliably available starting with MIPS32 (btw, this information is >> annoyingly hard to find). Thus this code should be restricted to such >> targets (which probably covers most current users) unless someone feels >> like writing a version for these older CPUs. > > I'm primarily concered about not enabling MADD where it's not available. > As for pre-MIPS32 processors - we can do the manual reading about where > to enable MAD(D) later. A simple ifdef will take care of that. > As for access to hi/lo, I tried to explicitly put a variable in the lo > register. Which sort of works for very simple cases but as expected it's > easy to get GCC to spill its RTL guts because it runs out of spill > registers. It maybe can be made to work but I'd feel nervous about its > stability unless a GCC guru approved this method. The "x" constraint can be used to move a double-word to/from the hi/lo registers. On DSP targets, the "ka" constraint provides access to the three additional hi/lo pairs while on a non-DSP targets it degenerates to "x". The "ka" constraint is available since gcc 4.3.0. I see no reason not to allow this extra flexibility. -- Måns Rullgård mans@xxxxxxxxx