On Wed, May 23, 2012 at 1:36 PM, David Miller <davem@xxxxxxxxxxxxx> wrote: > > I toyed around with some of the ideas we discussed but gcc really > mishandled all the approaches I tried. Have you tried coding them as ?: expressions, along with making all the temporaries separate variables? Sometimes that seems to make gcc more eager to use cmov's. Although that seemed to work better before. These days gcc sometimes seems so eager to show it knows better than the programmer that it is hard to make it do the obvious thing from the obvious source code.. > 1) In the loop, use the test: > > (x + 0xfefefeff) & ~(x | 0x7f7f7f7f) > > It's the same effective cost as the current test (on sparc > it would be ADD, OR, ANDNCC). > > We make sure to calculate the "x | 0x7f7f7f7f" part into > a variable which is not clobbered by the rest of the test. > > This is so we can reuse it in #2. > > 2) Once we find a word containing the zero byte, do a: > > ~(((x & 0x7f7f7f7f) + 0x7f7f7f7f) | x | 0x7f7f7f7f) > > and that "x | 0x7f7f7f7f" part is already calculated and thus > can be cribbed the place we left it in #1 above. > > And now we'll have exactly a 0x80 where there is a zero byte, > and no bleeding of 0x80 values into adjacent byte positions. > > Once we have that we can just test that mask directly for the > zero byte location search code. Sounds likely, and you only have two different constants to worry about. Sadly, I don't see any way to get the "only high bits set" cheaply, like the little-endian case does (ie going from "zero in second byte and after": 0x00808080 to the byte mask you need: 0xff000000). If you had that, and the appropriate unaligneds, you'd also have everything for the dcache case, not just strncpy. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-arch" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html