Il 01/10/2013 23:44, Benjamin Herrenschmidt ha scritto: > On Tue, 2013-10-01 at 13:19 +0200, Paolo Bonzini wrote: >> Il 01/10/2013 11:38, Benjamin Herrenschmidt ha scritto: >>> So for the sake of that dogma you are going to make us do something that >>> is about 100 times slower ? (and possibly involves more lines of code) >> >> If it's 100 times slower there is something else that's wrong. It's >> most likely not 100 times slower, and this makes me wonder if you or >> Michael actually timed the code at all. > > So no we haven't measured. But it is going to be VERY VERY VERY much > slower. Our exit latencies are bad with our current MMU *and* any exit > is going to cause all secondary threads on the core to have to exit as > well (remember P7 is 4 threads, P8 is 8) Ok, this is indeed the main difference between Power and x86. >> 100 cycles bare metal rdrand >> 2000 cycles guest->hypervisor->guest >> 15000 cycles guest->userspace->guest >> >> (100 cycles = 40 ns = 200 MB/sec; 2000 cycles = ~1 microseconds; 15000 >> cycles = ~7.5 microseconds). Even on 5 year old hardware, a userspace >> roundtrip is around a dozen microseconds. > > So in your case going to qemu to "emulate" rdrand would indeed be 150 > times slower, I don't see in what universe that would be considered a > good idea. rdrand is not privileged on x86, guests can use it. But my point is that going to the kernel is already 20 times slower. Getting entropy (not just a pseudo-random number seeded by the HWRNG) with rdrand is ~1000 times slower according to Intel's recommendations, so the roundtrip to userspace is entirely invisible in that case. The numbers for PPC seem to be a bit different though (it's faster to read entropy, and slower to do a userspace exit). > It's a random number obtained from sampling a set of oscillators. It's > slightly biased but we have very simple code (I believe shared with the > host kernel implementation) for whitening it as is required by PAPR. Good. Actually, passing the dieharder tests does not mean much (an AES-encrypted counter should also pass them with flashing colors), but if it's specified by the architecture gods it's likely to have received some scrutiny. >> 2) If the hwrng returns entropy, a read from the hwrng is going to even >> more expensive than an x86 rdrand (perhaps ~2000 cycles). > > Depends how often you read, the HW I think is sampling asynchronously so > you only block on the MMIO if you already consumed the previous sample > but I'll let Paulus provide more details here. Given Paul's description, there's indeed very little extra cost compared to a "nop" hypercall. That's nice. Still, considering that QEMU code has to be there anyway for compatibility, kernel emulation is not particularly necessary IMHO. I would of course like to see actual performance numbers, but besides that are you ever going to ever see this in the profile except if you run "dd if=/dev/hwrng of=/dev/null"? Can you instrument pHyp to find out how many times per second is this hypercall called by a "normal" Linux or AIX guest? >> 3) If the hypercall returns random numbers, then it is a pretty >> braindead interface since returning 8 bytes at a time limits the >> throughput to a handful of MB/s (compare to 200 MB/sec for x86 rdrand). >> But more important: in this case drivers/char/hw_random/pseries-rng.c >> is completely broken and insecure, just like patch 2 in case (1) above. > > How so ? Paul confirmed that it returns real entropy so this is moot. Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html