On 06/08/2012 04:20 PM, Sasha Levin wrote: > I re-ran benchmarks in a single user environment to get more stable results, increasing the test files to 50gb each. > > First, a test of the good case scenario for KVM TMEM - we'll try streaming a file which compresses well but is bigger than the host RAM size: > > First, no KVM TMEM, caching=none: > > sh-4.2# time dd if=test/zero of=/dev/null bs=4M count=2048 > 2048+0 records in > 2048+0 records out > 8589934592 bytes (8.6 GB) copied, 116.309 s, 73.9 MB/s > > real 1m56.349s > user 0m0.015s > sys 0m15.671s > sh-4.2# time dd if=test/zero of=/dev/null bs=4M count=2048 > 2048+0 records in > 2048+0 records out > 8589934592 bytes (8.6 GB) copied, 116.191 s, 73.9 MB/s > > real 1m56.255s > user 0m0.018s > sys 0m15.504s > > Now, no KVM TMEM, caching=writeback: > > sh-4.2# time dd if=test/zero of=/dev/null bs=4M count=2048 > 2048+0 records in > 2048+0 records out > 8589934592 bytes (8.6 GB) copied, 122.894 s, 69.9 MB/s > > real 2m2.965s > user 0m0.015s > sys 0m11.025s > sh-4.2# time dd if=test/zero of=/dev/null bs=4M count=2048 > 2048+0 records in > 2048+0 records out > 8589934592 bytes (8.6 GB) copied, 110.915 s, 77.4 MB/s > > real 1m50.968s > user 0m0.011s > sys 0m10.108s Strange that system time is lower with cache=writeback. > > And finally, KVM TMEM on, caching=none: > > sh-4.2# time dd if=test/zero of=/dev/null bs=4M count=2048 > 2048+0 records in > 2048+0 records out > 8589934592 bytes (8.6 GB) copied, 119.024 s, 72.2 MB/s > > real 1m59.123s > user 0m0.020s > sys 0m29.336s > > sh-4.2# time dd if=test/zero of=/dev/null bs=4M count=2048 > 2048+0 records in > 2048+0 records out > 8589934592 bytes (8.6 GB) copied, 36.8798 s, 233 MB/s > > real 0m36.950s > user 0m0.005s > sys 0m35.308s So system time more than doubled compared to non-tmem cache=none. The overhead per page is 17s / (8589934592/4096) = 8.1usec. Seems quite high. 'perf top' while this is running would be interesting. > > This is a snapshot of kvm_stats while the 2nd run in the KVM TMEM test was going: > > kvm statistics > > kvm_exit 1952342 36037 > kvm_entry 1952334 36034 > kvm_hypercall 1710568 33948 In that test, 56k pages/sec were transferred. Why are we seeing only 33k hypercalls/sec? Shouldn't we have two hypercalls/page (one when evicting a page to make some room, one to read the new page from tmem)? > > > Now, for the worst case "streaming test". I've tried streaming two files, one which has good compression (zeros), and one full with random bits. Doing two runs for each. > > First, the baseline - no KVM TMEM, caching=none: > > Zero file: > 12800+0 records in > 12800+0 records out > 53687091200 bytes (54 GB) copied, 703.502 s, 76.3 MB/s > > real 11m43.583s > user 0m0.106s > sys 1m42.075s > 12800+0 records in > 12800+0 records out > 53687091200 bytes (54 GB) copied, 691.208 s, 77.7 MB/s > > real 11m31.284s > user 0m0.100s > sys 1m41.235s > > Random file: > 12594+1 records in > 12594+1 records out > 52824875008 bytes (53 GB) copied, 655.778 s, 80.6 MB/s > > real 10m55.847s > user 0m0.107s > sys 1m39.852s > 12594+1 records in > 12594+1 records out > 52824875008 bytes (53 GB) copied, 652.668 s, 80.9 MB/s > > real 10m52.739s > user 0m0.120s > sys 1m39.712s > > Now, this is with zcache enabled in the guest (not going through KVM TMEM), caching=none: > > Zeros: > 12800+0 records in > 12800+0 records out > 53687091200 bytes (54 GB) copied, 704.479 s, 76.2 MB/s > > real 11m44.536s > user 0m0.088s > sys 2m0.639s > 12800+0 records in > 12800+0 records out > 53687091200 bytes (54 GB) copied, 690.501 s, 77.8 MB/s > > real 11m30.561s > user 0m0.088s > sys 1m57.637s zcache appears not to be helping at all; it's just adding overhead. Is even the compressed file too large? overhead = 1.4 usec/page. > > Random: > 12594+1 records in > 12594+1 records out > 52824875008 bytes (53 GB) copied, 656.436 s, 80.5 MB/s > > real 10m56.480s > user 0m0.034s > sys 3m18.750s > 12594+1 records in > 12594+1 records out > 52824875008 bytes (53 GB) copied, 658.446 s, 80.2 MB/s > > real 10m58.499s > user 0m0.046s > sys 3m23.678s Overhead grows to 7.6 usec/page. > > Next, with KVM TMEM enabled, caching=none: > > Zeros: > 12800+0 records in > 12800+0 records out > 53687091200 bytes (54 GB) copied, 711.754 s, 75.4 MB/s > > real 11m51.916s > user 0m0.081s > sys 2m59.952s > 12800+0 records in > 12800+0 records out > 53687091200 bytes (54 GB) copied, 690.958 s, 77.7 MB/s > > real 11m31.102s > user 0m0.082s > sys 3m6.500s Overhead = 6.6 usec/page. > > Random: > 12594+1 records in > 12594+1 records out > 52824875008 bytes (53 GB) copied, 656.378 s, 80.5 MB/s > > real 10m56.445s > user 0m0.062s > sys 5m53.236s > 12594+1 records in > 12594+1 records out > 52824875008 bytes (53 GB) copied, 653.353 s, 80.9 MB/s > > real 10m53.404s > user 0m0.066s > sys 5m57.087s Overhead = 19 usec/page. This is pretty steep. We have flash storage doing a million iops/sec, and here you add 19 microseconds to that. > > > This is a snapshot of kvm_stats while this test was running: > > kvm statistics > > kvm_entry 168179 20729 > kvm_exit 168179 20728 > kvm_hypercall 131808 16409 The last test was running 19k pages/sec, doesn't quite fit with this measurement. Is the measurement stable or fluctuating? > > And finally, KVM TMEM enabled, with caching=writeback: I'm not sure what the point of this is? You have two host-caching mechanisms running in parallel, are you trying to increase overhead while reducing effective cache size? My conclusion is that the overhead is quite high, but please double check my numbers, maybe I missed something obvious. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html