On Mon, Oct 5, 2015 at 9:48 PM, Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx> wrote: > > Although I was probably wrong about the source of the overhead, the > point still remains that the prefaulting is eating cycles for no > practical benefit. Yeah, no, I'm not disagreeing with that part, I'm just more of a "at this point in the rc series we are probably better off reverting". Your ext4 patch may well fix the issue, and be the right thing to do (_regardless_ of the revert, in fact - while it might make the revert unnecessary, it might also be a good idea even if we do revert). The subtlety of this just worries me, and the reason I'd still be inclined to revert is simply "it's been that way a long time, the safe thing is to go back and take this slow". > With "-e cycles:pp": >> │ sub $0x8,%rsp >> 24.57 │ stac >> 15.49 │ mov (%rcx),%sil >> 29.06 │ clac >> 2.24 │ test %eax,%eax >> 8.77 │ mov %sil,-0x1(%rbp) >> 2.22 │ ↓ jne 66 >> │ movslq %edx,%rdx Ok, so it really is the stac/clac that is the bulk of the cost. Hmm. You're right that the loop there will only be executed once for your case, so moving the stac/clac outside probably doesn't help. It *might* still make a difference just for microarchitectural reasons (ie they may cause more trouble just because they are close to an instruction that depends on them), but it's questionable. It is a bit worrisome to see that those things are so expensive. Right now almost all user accesses will cause *lots* of clac/stac stuff. I originally asked Intel to do SMAP using a segment prefix, but that was not to be.. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html