On Wed, 2011-02-23 at 11:06 -0700, Alex Williamson wrote: > On Wed, 2011-02-23 at 15:12 +0200, Avi Kivity wrote: > > On 02/22/2011 08:54 PM, Alex Williamson wrote: > > > This series introduces a new weight-balanced binary tree (wbtree) for > > > general use. It's largely leveraged from the rbtree, copying it's > > > rotate functions, while introducing different rebalance and erase > > > functions. This tree is particularly useful for managing memory > > > ranges, where it's desirable to have the most likely targets (the > > > largest ranges) at the top of each subtree. > > > > > > Patches 2& 3 go on to convert the KVM memory slots to a growable > > > array and make use of wbtree for efficient managment. Trying to > > > exercise the worst case for this data structure, I ran netperf > > > TCP_RR on an emulated rtl8139 NIC connected directly to the host > > > via a tap. Both qemu-kvm and the netserver on the host were > > > pinned to optimal CPUs with taskset. This series resulted in > > > a 3% improvement for this test. > > > > > > > In this case, I think most of the faults (at least after the guest was > > warmed up) missed the tree completely. > > Except for the mmio faults for the NIC, which will traverse the entire > depth of that branch of the tree for every access. > > > In this case a weight balanced > > tree is hardly optimal (it is optimized for hits), so I think you'll see > > a bigger gain from the mmio fault optimization. You'll probably see > > most of the gain running mmu intensive tests with ept=0. > > Right, the gain expected by this test is that we're only traversing 6-7 > tree nodes until we don't find a match, versus the full 32 entries of > the original memslot array. So it's effectively comparing worst case > scenarios for both data structures. > > Hopefully the followup with kernbench run with ept=0 show that there's > also still a benefit in the data match scenario. The existing array > ends up being nearly optimal for memory hits since it registers memory > from 1M - 3.5G in slot0 and 4G - 10.5G in slot1. For the tree, we jump > straight to the bigger slot. I'll run one more set of kernbench tests > with the original code, just reversing slots 0&1 to see if we take much > of a hit from the tree overhead. Thanks, I had forgotten about <1M mem, so actually the slot configuration was: 0: <1M 1: 1M - 3.5G 2: 4G+ I stacked the deck in favor of the static array (0: 4G+, 1: 1M-3.5G, 2: <1M), and got these kernbench results: base (stdev) reorder (stdev) wbtree (stdev) --------+-----------------+----------------+----------------+ Elapsed | 42.809 (0.19) | 42.160 (0.22) | 42.305 (0.23) | User | 115.709 (0.22) | 114.358 (0.40) | 114.720 (0.31) | System | 41.605 (0.14) | 40.741 (0.22) | 40.924 (0.20) | %cpu | 366.9 (1.45) | 367.4 (1.17) | 367.6 (1.51) | context | 7272.3 (68.6) | 7248.1 (89.7) | 7249.5 (97.8) | sleeps | 14826.2 (110.6) | 14780.7 (86.9) | 14798.5 (63.0) | So, wbtree is only slightly behind reordering, and the standard deviation suggests the runs are mostly within the noise of each other. Thanks, Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html