On Wed, Jun 17, 2015 at 09:33:57AM +0200, Igor Mammedov wrote: > On Wed, 17 Jun 2015 08:31:23 +0200 > "Michael S. Tsirkin" <mst@xxxxxxxxxx> wrote: > > > On Wed, Jun 17, 2015 at 12:19:15AM +0200, Igor Mammedov wrote: > > > On Tue, 16 Jun 2015 23:16:07 +0200 > > > "Michael S. Tsirkin" <mst@xxxxxxxxxx> wrote: > > > > > > > On Tue, Jun 16, 2015 at 06:33:34PM +0200, Igor Mammedov wrote: > > > > > Series extends vhost to support upto 509 memory regions, > > > > > and adds some vhost:translate_desc() performance improvemnts > > > > > so it won't regress when memslots are increased to 509. > > > > > > > > > > It fixes running VM crashing during memory hotplug due > > > > > to vhost refusing accepting more than 64 memory regions. > > > > > > > > > > It's only host kernel side fix to make it work with QEMU > > > > > versions that support memory hotplug. But I'll continue > > > > > to work on QEMU side solution to reduce amount of memory > > > > > regions to make things even better. > > > > > > > > I'm concerned userspace work will be harder, in particular, > > > > performance gains will be harder to measure. > > > it appears so, so far. > > > > > > > How about a flag to disable caching? > > > I've tried to measure cost of cache miss but without much luck, > > > difference between version with cache and with caching removed > > > was within margin of error (±10ns) (i.e. not mensurable on my > > > 5min/10*10^6 test workload). > > > > Confused. I thought it was very much measureable. > > So why add a cache if you can't measure its effect? > I hasn't been able to measure immediate delta between function > start/end with precision more than 10ns, perhaps used method > (system tap) is to blame. > But it's still possible to measure indirectly like 2% from 5/5. Ah, makes sense. > > > > > Also I'm concerned about adding extra fetch+branch for flag > > > checking will make things worse for likely path of cache hit, > > > so I'd avoid it if possible. > > > > > > Or do you mean a simple global per module flag to disable it and > > > wrap thing in static key so that it will be cheap jump to skip > > > cache? > > > > Something like this, yes. > ok, will do. > > > > > > > > Performance wise for guest with (in my case 3 memory regions) > > > > > and netperf's UDP_RR workload translate_desc() execution > > > > > time from total workload takes: > > > > > > > > > > Memory |1G RAM|cached|non cached > > > > > regions # | 3 | 53 | 53 > > > > > ------------------------------------ > > > > > upstream | 0.3% | - | 3.5% > > > > > ------------------------------------ > > > > > this series | 0.2% | 0.5% | 0.7% > > > > > > > > > > where "non cached" column reflects trashing wokload > > > > > with constant cache miss. More details on timing in > > > > > respective patches. > > > > > > > > > > Igor Mammedov (5): > > > > > vhost: use binary search instead of linear in find_region() > > > > > vhost: extend memory regions allocation to vmalloc > > > > > vhost: support upto 509 memory regions > > > > > vhost: add per VQ memory region caching > > > > > vhost: translate_desc: optimization for desc.len < region size > > > > > > > > > > drivers/vhost/vhost.c | 95 > > > > > +++++++++++++++++++++++++++++++++++++-------------- > > > > > drivers/vhost/vhost.h | 1 + 2 files changed, 71 insertions(+), > > > > > 25 deletions(-) > > > > > > > > > > -- > > > > > 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html