On Thu, Apr 04, 2013 at 06:36:30PM +0300, Michael S. Tsirkin wrote: > > processor : 0 > > vendor_id : AuthenticAMD > > cpu family : 16 > > model : 8 > > model name : Six-Core AMD Opteron(tm) Processor 8435 > > stepping : 0 > > cpu MHz : 800.000 > > cache size : 512 KB > > physical id : 0 > > siblings : 6 > > core id : 0 > > cpu cores : 6 > > apicid : 8 > > initial apicid : 0 > > fpu : yes > > fpu_exception : yes > > cpuid level : 5 > > wp : yes > > flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nopl nonstop_tsc extd_apicid pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt npt lbrv svm_lock nrip_save pausefilter > > bogomips : 5199.87 > > TLB size : 1024 4K pages > > clflush size : 64 > > cache_alignment : 64 > > address sizes : 48 bits physical, 48 bits virtual > > power management: ts ttp tm stc 100mhzsteps hwpstate > > Hmm, svm code seems less optimized for MMIO, but PIO > is almost identical. Gleb says the unittest is broken > on AMD so I'll wait until it's fixed to test. > It's not unittest is broken, its my environment is broken :) > Did you do PIO reads by chance? > > > > > > > Or could be different software, this is on top of 3.9.0-rc5, what > > > did you try? > > > > 3.0 plus kvm-kmod of whatever was current back in autumn :). > > > > > > > >> MST, could you please do a real world latency benchmark with virtio-net and > > >> > > >> * normal ioeventfd > > >> * mmio-pv eventfd > > >> * hcall eventfd > > > > > > I can't do this right away, sorry. For MMIO we are discussing the new > > > layout on the virtio mailing list, guest and qemu need a patch for this > > > too. My hcall patches are stale and would have to be brought up to > > > date. > > > > > > > > >> to give us some idea how much performance we would gain from each approach? Thoughput should be completely unaffected anyway, since virtio just coalesces kicks internally. > > > > > > Latency is dominated by the scheduling latency. > > > This means virtio-net is not the best benchmark. > > > > So what is a good benchmark? > > E.g. ping pong stress will do but need to look at CPU utilization, > that's what is affected, not latency. > > > Is there any difference in speed at all? I strongly doubt it. One of virtio's main points is to reduce the number of kicks. > > For this stage of the project I think microbenchmarks are more appropriate. > Doubling the price of exit is likely to be measureable. 30 cycles likely > not ... > > > > > > >> I'm also slightly puzzled why the wildcard eventfd mechanism is so significantly slower, while it was only a few percent on my test system. What are the numbers you're listing above? Cycles? How many cycles do you execute in a second? > > >> > > >> > > >> Alex > > > > > > > > > It's the TSC divided by number of iterations. kvm unittest this value, here's > > > what it does (removed some dead code): > > > > > > #define GOAL (1ull << 30) > > > > > > do { > > > iterations *= 2; > > > t1 = rdtsc(); > > > > > > for (i = 0; i < iterations; ++i) > > > func(); > > > t2 = rdtsc(); > > > } while ((t2 - t1) < GOAL); > > > printf("%s %d\n", test->name, (int)((t2 - t1) / iterations)); > > > > So it's the number of cycles per run. > > > > That means translated my numbers are: > > > > MMIO: 4307 > > PIO: 3658 > > HCALL: 1756 > > > > MMIO - PIO = 649 > > > > which aligns roughly with your PV MMIO callback. > > > > My MMIO benchmark was to poke the LAPIC version register. That does go through instruction emulation, no? > > > > > > Alex > > Why wouldn't it? > Intel decodes access to apic page, but we use it only for fast eoi. -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html