Gregory Haskins wrote: > David S. Ahern wrote: >> Marcelo Tosatti wrote: >> >>> On Fri, May 08, 2009 at 10:45:52AM -0400, Gregory Haskins wrote: >>> >>>> Marcelo Tosatti wrote: >>>> >>>>> On Fri, May 08, 2009 at 10:55:37AM +0300, Avi Kivity wrote: >>>>> >>>>> >>>>>> Marcelo Tosatti wrote: >>>>>> >>>>>> >>>>>>> Also it would be interesting to see the MMIO comparison with EPT/NPT, >>>>>>> it probably sucks much less than what you're seeing. >>>>>>> >>>>>>> >>>>>>> >>>>>> Why would NPT improve mmio? If anything, it would be worse, since the >>>>>> processor has to do the nested walk. >>>>>> >>>>>> Of course, these are newer machines, so the absolute results as well as >>>>>> the difference will be smaller. >>>>>> >>>>>> >>>>> Quad-Core AMD Opteron(tm) Processor 2358 SE 2.4GHz: >>>>> >>>>> NPT enabled: >>>>> test 0: 3088633284634 - 3059375712321 = 29257572313 >>>>> test 1: 3121754636397 - 3088633419760 = 33121216637 >>>>> test 2: 3204666462763 - 3121754668573 = 82911794190 >>>>> >>>>> NPT disabled: >>>>> test 0: 3638061646250 - 3609416811687 = 28644834563 >>>>> test 1: 3669413430258 - 3638061771291 = 31351658967 >>>>> test 2: 3736287253287 - 3669413463506 = 66873789781 >>>>> >>>>> >>>>> >>>> Thanks for running that. Its interesting to see that NPT was in fact >>>> worse as Avi predicted. >>>> >>>> Would you mind if I graphed the result and added this data to my wiki? >>>> If so, could you adjust the tsc result into IOPs using the proper >>>> time-base and the test_count you ran with? I can show a graph with the >>>> data as is and the relative differences will properly surface..but it >>>> would be nice to have apples to apples in terms of IOPS units with my >>>> other run. >>>> >>>> -Greg >>>> >>> Please, that'll be nice. >>> >>> Quad-Core AMD Opteron(tm) Processor 2358 SE >>> >>> host: 2.6.30-rc2 >>> guest: 2.6.29.1-102.fc11.x86_64 >>> >>> test_count=1000000, tsc freq=2402882804 Hz >>> >>> NPT disabled: >>> >>> test 0 = 2771200766 >>> test 1 = 3018726738 >>> test 2 = 6414705418 >>> test 3 = 2890332864 >>> >>> NPT enabled: >>> >>> test 0 = 2908604045 >>> test 1 = 3174687394 >>> test 2 = 7912464804 >>> test 3 = 3046085805 >>> >>> >> DL380 G6, 1-E5540, 6 GB RAM, SMT enabled: >> host: 2.6.30-rc3 >> guest: fedora 9, 2.6.27.21-78.2.41.fc9.x86_64 >> >> with EPT >> test 0: 543617607291 - 518146439877 = 25471167414 >> test 1: 572568176856 - 543617703004 = 28950473852 >> test 2: 630182158139 - 572568269792 = 57613888347 >> >> >> without EPT >> test 0: 1383532195307 - 1358052032086 = 25480163221 >> test 1: 1411587055210 - 1383532318617 = 28054736593 >> test 2: 1471446356172 - 1411587194600 = 59859161572 >> >> >> > > Thank you kindly, David. > > -Greg I ran another test case with SMT disabled, and while I was at it converted TSC delta to operations/sec. The results without SMT are confusing -- to me anyways. I'm hoping someone can explain it. Basically, using a count of 10,000,000 (per your web page) with SMT disabled the guest detected a soft lockup on the CPU. So, I dropped the count down to 1,000,000. So, for 1e6 iterations: without SMT, with EPT: HC: 259,455 ops/sec PIO: 226,937 ops/sec MMIO: 113,180 ops/sec without SMT, without EPT: HC: 274,825 ops/sec PIO: 247,910 ops/sec MMIO: 111,535 ops/sec Converting the prior TSC deltas: with SMT, with EPT: HC: 994,655 ops/sec PIO: 875,116 ops/sec MMIO: 439,738 ops/sec with SMT, without EPT: HC: 994,304 ops/sec PIO: 903,057 ops/sec MMIO: 423,244 ops/sec Running the tests repeatedly I did notice a fair variability (as much as -10% down from these numbers). Also, just to make sure I converted the delta to ops/sec, the formula I used was cpu_freq / dTSC * count = operations/sec david -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html