On Mon, 2012-06-11 at 11:09 +0300, Avi Kivity wrote: > On 06/08/2012 04:20 PM, Sasha Levin wrote: > > I re-ran benchmarks in a single user environment to get more stable results, increasing the test files to 50gb each. > > > > First, a test of the good case scenario for KVM TMEM - we'll try streaming a file which compresses well but is bigger than the host RAM size: > > > > First, no KVM TMEM, caching=none: > > > > sh-4.2# time dd if=test/zero of=/dev/null bs=4M count=2048 > > 2048+0 records in > > 2048+0 records out > > 8589934592 bytes (8.6 GB) copied, 116.309 s, 73.9 MB/s > > > > real 1m56.349s > > user 0m0.015s > > sys 0m15.671s > > sh-4.2# time dd if=test/zero of=/dev/null bs=4M count=2048 > > 2048+0 records in > > 2048+0 records out > > 8589934592 bytes (8.6 GB) copied, 116.191 s, 73.9 MB/s > > > > real 1m56.255s > > user 0m0.018s > > sys 0m15.504s > > > > Now, no KVM TMEM, caching=writeback: > > > > sh-4.2# time dd if=test/zero of=/dev/null bs=4M count=2048 > > 2048+0 records in > > 2048+0 records out > > 8589934592 bytes (8.6 GB) copied, 122.894 s, 69.9 MB/s > > > > real 2m2.965s > > user 0m0.015s > > sys 0m11.025s > > sh-4.2# time dd if=test/zero of=/dev/null bs=4M count=2048 > > 2048+0 records in > > 2048+0 records out > > 8589934592 bytes (8.6 GB) copied, 110.915 s, 77.4 MB/s > > > > real 1m50.968s > > user 0m0.011s > > sys 0m10.108s > > Strange that system time is lower with cache=writeback. Maybe because these pages don't get written out immediately? I don't have a better guess. > > And finally, KVM TMEM on, caching=none: > > > > sh-4.2# time dd if=test/zero of=/dev/null bs=4M count=2048 > > 2048+0 records in > > 2048+0 records out > > 8589934592 bytes (8.6 GB) copied, 119.024 s, 72.2 MB/s > > > > real 1m59.123s > > user 0m0.020s > > sys 0m29.336s > > > > sh-4.2# time dd if=test/zero of=/dev/null bs=4M count=2048 > > 2048+0 records in > > 2048+0 records out > > 8589934592 bytes (8.6 GB) copied, 36.8798 s, 233 MB/s > > > > real 0m36.950s > > user 0m0.005s > > sys 0m35.308s > > So system time more than doubled compared to non-tmem cache=none. The > overhead per page is 17s / (8589934592/4096) = 8.1usec. Seems quite high. Right, but consider it didn't increase real time at all. > 'perf top' while this is running would be interesting. I'll update later with this. > > This is a snapshot of kvm_stats while the 2nd run in the KVM TMEM test was going: > > > > kvm statistics > > > > kvm_exit 1952342 36037 > > kvm_entry 1952334 36034 > > kvm_hypercall 1710568 33948 > > In that test, 56k pages/sec were transferred. Why are we seeing only > 33k hypercalls/sec? Shouldn't we have two hypercalls/page (one when > evicting a page to make some room, one to read the new page from tmem)? The guest doesn't do eviction at all, in fact - it doesn't know how big the cache is so even if it wanted to, it couldn't evict pages (the only thing it does is invalidate pages which have changed in the guest). This means it only takes one hypercall/page instead of two. > > > > > > Now, for the worst case "streaming test". I've tried streaming two files, one which has good compression (zeros), and one full with random bits. Doing two runs for each. > > > > First, the baseline - no KVM TMEM, caching=none: > > > > Zero file: > > 12800+0 records in > > 12800+0 records out > > 53687091200 bytes (54 GB) copied, 703.502 s, 76.3 MB/s > > > > real 11m43.583s > > user 0m0.106s > > sys 1m42.075s > > 12800+0 records in > > 12800+0 records out > > 53687091200 bytes (54 GB) copied, 691.208 s, 77.7 MB/s > > > > real 11m31.284s > > user 0m0.100s > > sys 1m41.235s > > > > Random file: > > 12594+1 records in > > 12594+1 records out > > 52824875008 bytes (53 GB) copied, 655.778 s, 80.6 MB/s > > > > real 10m55.847s > > user 0m0.107s > > sys 1m39.852s > > 12594+1 records in > > 12594+1 records out > > 52824875008 bytes (53 GB) copied, 652.668 s, 80.9 MB/s > > > > real 10m52.739s > > user 0m0.120s > > sys 1m39.712s > > > > Now, this is with zcache enabled in the guest (not going through KVM TMEM), caching=none: > > > > Zeros: > > 12800+0 records in > > 12800+0 records out > > 53687091200 bytes (54 GB) copied, 704.479 s, 76.2 MB/s > > > > real 11m44.536s > > user 0m0.088s > > sys 2m0.639s > > 12800+0 records in > > 12800+0 records out > > 53687091200 bytes (54 GB) copied, 690.501 s, 77.8 MB/s > > > > real 11m30.561s > > user 0m0.088s > > sys 1m57.637s > > zcache appears not to be helping at all; it's just adding overhead. Is > even the compressed file too large? > > overhead = 1.4 usec/page. Correct, I've had to further increase the size of this file so that zcache would fail here as well. The good case was tested before, here I wanted to see what will happen with files that wouldn't have much benefit from both regular caching and zcache. > > > > Random: > > 12594+1 records in > > 12594+1 records out > > 52824875008 bytes (53 GB) copied, 656.436 s, 80.5 MB/s > > > > real 10m56.480s > > user 0m0.034s > > sys 3m18.750s > > 12594+1 records in > > 12594+1 records out > > 52824875008 bytes (53 GB) copied, 658.446 s, 80.2 MB/s > > > > real 10m58.499s > > user 0m0.046s > > sys 3m23.678s > > Overhead grows to 7.6 usec/page. > > > > > Next, with KVM TMEM enabled, caching=none: > > > > Zeros: > > 12800+0 records in > > 12800+0 records out > > 53687091200 bytes (54 GB) copied, 711.754 s, 75.4 MB/s > > > > real 11m51.916s > > user 0m0.081s > > sys 2m59.952s > > 12800+0 records in > > 12800+0 records out > > 53687091200 bytes (54 GB) copied, 690.958 s, 77.7 MB/s > > > > real 11m31.102s > > user 0m0.082s > > sys 3m6.500s > > Overhead = 6.6 usec/page. > > > > > Random: > > 12594+1 records in > > 12594+1 records out > > 52824875008 bytes (53 GB) copied, 656.378 s, 80.5 MB/s > > > > real 10m56.445s > > user 0m0.062s > > sys 5m53.236s > > 12594+1 records in > > 12594+1 records out > > 52824875008 bytes (53 GB) copied, 653.353 s, 80.9 MB/s > > > > real 10m53.404s > > user 0m0.066s > > sys 5m57.087s > > > Overhead = 19 usec/page. > > This is pretty steep. We have flash storage doing a million iops/sec, > and here you add 19 microseconds to that. Might be interesting to test it with flash storage as well... > > > > > > This is a snapshot of kvm_stats while this test was running: > > > > kvm statistics > > > > kvm_entry 168179 20729 > > kvm_exit 168179 20728 > > kvm_hypercall 131808 16409 > > The last test was running 19k pages/sec, doesn't quite fit with this > measurement. Is the measurement stable or fluctuating? It's pretty stable when running the "zero" pages, but when switching to random files it somewhat fluctuates. > > > > And finally, KVM TMEM enabled, with caching=writeback: > > I'm not sure what the point of this is? You have two host-caching > mechanisms running in parallel, are you trying to increase overhead > while reducing effective cache size? I thought that you've asked for this test: On Wed, 2012-06-06 at 16:24 +0300, Avi Kivity wrote: > while cache=writeback with cleancache enabled in the host should > give the same effect, but with the extra hypercalls, but with an extra > copy to manage the host pagecache. It would be good to see results for all three settings. > My conclusion is that the overhead is quite high, but please double > check my numbers, maybe I missed something obvious. I'm not sure what options I have to lower the overhead here, should I be using something other than hypercalls to communicate with the host? I know that there are several things being worked on from zcache perspective (WasActive, batching, etc), but is there something that could be done within the scope of kvm-tmem? It would be interesting in seeing results for Xen/TMEM and comparing them to these results. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html