From: Holger Lubitz > Sent: 24 December 2022 09:34 > > On Thu, 2022-12-22 at 10:41 +0000, David Laight wrote: > > I wonder how much slower it is - m68k is likely to be microcoded > > and I don't think instruction timings are actually available. > > Not sure if these are in any way official, but > http://oldwww.nvg.ntnu.no/amiga/MC680x0_Sections/mc68030timing.HTML I thought about that some more and remember seeing memory timings on a logic analyser - and getting timings that (more or less) implied sequential execution limited by the obvious memory (cache) accesses. The microcoding is more apparent in the large mid-instruction interrupt stack frames - eg for page faults. > > (There's also > http://oldwww.nvg.ntnu.no/amiga/MC680x0_Sections/mc68000timing.HTML > but that is probably only interesting to demo coders by now) > > > The fastest version probably uses subx (with carry) to generate > > 0/-1 and leaves +delta for the other result - but getting the > > compares and branches in the right order is hard. > > Guess it must have been over 20 years since I wrote any 68k asm, but > now I actually ended up installing Debian on qemu to experiment. > > There are two interesting differences between 68k and x86 that can be > useful here: Unlike x86, MOV on 68k sets the flags. And also, subx > differs from sbb in that it resets the zero flag on a non-zero result, > but does not set it on a zero result. So if it is set, it must have > been set before. > > Here are the two functions I came up with (tested only stand-alone, not > in a kernel build. Also no benchmarks because this 68040 is only > emulated) > #1 (optimized for minimum instruction count in loop, > 68k + Coldfire ISA_B) > > int strcmp1(const char *cs, const char *ct) > { > int res; > > asm ("\n" > "1: move.b (%0)+,%2\n" /* get *cs */ > " jeq 2f\n" /* end of first string? */ > " cmp.b (%1)+,%2\n" /* compare *ct */ > " jeq 1b\n" /* if equal, continue */ > " jra 3f\n" /* else skip to tail */ > "2: cmp.b (%1)+,%2\n" /* compare one last byte */ > "3: subx.l %2, %2\n" /* -1 if borrow, 0 if not */ > " jls 4f\n" /* if set, z is from sub.b */ The subx will set Z unless C was set. So that doesn't seem right. > " moveq.l #1, %2\n" /* 1 if !borrow */ > "4:" > : "+a" (cs), "+a" (ct), "=d" (res)); > return res; > } I think this should work: (But the jc might need to be jnc.) " moveq.l #0,%2\n" /* zero high bits of result */ "1: move.b (%1)+,%2\n" /* get *ct */ " jeq 2f\n" /* end of second string? */ " cmp.b (%0)+,%2\n" /* compare *cs */ " jeq 1b\n" /* if equal, continue */ " jc 4f /* return +ve */ " moveq.l #-1, %2\n" /* return -ve */ " jra 4f\n" "2: move.b (%0),%2\n" /* check for matching strings */ "4:" > #2 (optimized for minimum code size, > Coldfire ISA_A compatible) > > int strcmp2(const char *cs, const char *ct) > { > int res = 0, tmp = 0; > > asm ("\n" > "1: move.b (%0)+,%2\n" /* get *cs */ > " move.b (%1)+,%3\n" /* get *ct */ > " subx.l %3,%2\n" /* compare a byte */ > " jeq 2f\n" /* both inputs were zero */ That doesn't seem right. Z will be set if either *ct is zero or the bytes match. > " tst.l %2\n" /* check result */ This only sets Z when it was already set by the subx. > " jeq 1b\n" /* if zero, continue */ > "2:" > : "+a" (cs), "+a" (ct), "+d" (res), "+d" (tmp)); > return res; > } > > However, this one needs res and tmp to be set to zero, because we read > only bytes (no automatic zero-extend on 68k), but then do a long > operation on them. Coldfire ISA_A dropped cmpb, it only reappeared in > ISA_B. > > So the real instruction count is likely to be two more, unless gcc > happens to have one or two zeros it can reuse. > > > I believe some of the other m68k asm functions are also missing > > the "memory" 'clobber' and so could get mis-optimised. > > In which case would that happen? This function doesn't clobber memory > and its result does get used. If gcc mistakenly thinks the parameters > haven't changed and uses a previously cached result, wouldn't that > apply to a C function too? You need a memory 'clobber' on anything that READS memory as well as writes it. > > While I can write (or rather have written) m68k asm I don't have > > a compiler. > > Well, I now have an emulated Quadra 800 running Debian 68k.(Getting the > emulated networking to work reliably was a bit problematic, though. But > now it runs Kernel 6.0) qemu could emulate Coldfire too, but I am not > sure where I would find a distribution for that. > > I did not attach a patch because it seems already to be decided that > the function is gone. But should anyone still want to include one (or > both) of these functions, just give credit to me and I'm fine. Thinking further the fastest strcmp() probably uses big-endian word compares with a check for a zero byte. Especially on 64 bit systems that support misaligned loads. But I'd need to think hard about the actual details. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)