On Tue, 2012-04-10 at 16:33 -0700, H. Peter Anvin wrote: > > Just wanted to mention that handling the detect zeroes operations on > > cpus that require alignment is easy, just rewind the pointer at the > > beginning to be aligned and "or" in a mask of 0xff for each alignment > > pad byte into the initially loaded word. > > > > Even on machines which don't require alignment it will still be faster > to do aligned memory references only, not counting the startup cost > (which is substantial in this case, of course, since the average length > is so short.) However, it also neatly avoids the page overrun problem. I'm leaning toward that too, but I want to do some benches. The main issues for me are: - I have to deal with a reasonably wide range of different cores which will handle unaligned accesses very differently. Almost all will do it in HW but with very varying degree of performances and some will occasionally trap (SW emulation kicks in but that's extremely slow). The trapping case is generally rare though, depending on the core it will happen on things like page boundaries or segment boundaries. I also suspect that the byte-reverse load/store instructions will suck more at unaligned. - The page overrun is an issue. On 64-bit we don't have anything mapped past the end of the linear mapping and on 32-bit we fall into ioremap space. That's fixable with a quick hack to add one more page to the linear mapping, creating a double mapping of either page 0 or any random page of memory, I don't have cache aliases or anything like that to worry about but it's gross. Anyways, I'll try to play around if I get time, might have to wait for next week tho, I have some more urgent stuff to sort out and I'm off friday to tuesday. Cheers, Ben. -- To unsubscribe from this list: send the line "unsubscribe linux-arch" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html