On Wed, May 02, 2012 at 02:52:14PM +0200, Alexander Graf wrote: > >+ /* > >+ * If the user wants a different size from default, > >+ * try first to allocate it from the kernel page allocator. > >+ */ > >+ hpt = 0; > >+ if (order != kvm_hpt_order) { > > hpt = __get_free_pages(GFP_KERNEL|__GFP_ZERO|__GFP_REPEAT| > >- __GFP_NOWARN, HPT_ORDER - PAGE_SHIFT); > >+ __GFP_NOWARN, order - PAGE_SHIFT); > > I like the idea quite a lot, but shouldn't we set a max memory > threshold through a kernel module option that a user space program > could pin this way? The page allocator will hand out at most 16MB with the default setting of CONFIG_FORCE_MAX_ZONEORDER. Any larger request will fail, and userspace can only do one allocation per VM. Of course, userspace can start hundreds of VMs and use up memory that way - but that is exactly the same situation as we have already, where we allocate 16MB per VM at VM creation time, so this patch makes things no worse in that regard. > >+long kvmppc_alloc_reset_hpt(struct kvm *kvm, u32 *htab_orderp) > >+{ > >+ long err = -EBUSY; > >+ long order; > >+ > >+ mutex_lock(&kvm->lock); > >+ if (kvm->arch.rma_setup_done) { > >+ kvm->arch.rma_setup_done = 0; > >+ /* order rma_setup_done vs. vcpus_running */ > >+ smp_mb(); > >+ if (atomic_read(&kvm->arch.vcpus_running)) { > > Is this safe? What if we're running one vcpu thread and one hpt > reset thread. The vcpu thread starts, is just before the vcpu_run > function, the reset thread checks if anything is running, nothing is > running, the vcpu thread goes on to do its thing and boom things > break. No - in the situation you describe, the vcpu thread will see that kvm->arch.rma_setup_done is clear and call kvmppc_hv_setup_htab_rma(), where the first thing it does is mutex_lock(&kvm->lock), and it sits there until the reset thread has finished and unlocked the mutex. The vcpu thread has to see rma_setup_done clear since the reset thread clears it and then has a barrier, before testing vcpus_running. If on the other hand the vcpu thread gets there first and sees that kvm->arch.rma_setup_done is still set, then the reset thread must see that kvm->arch.vcpus_running is non-zero, since the vcpu thread first increments vcpus_running and then tests rma_setup_done, with a barrier in between. > But then again, what's the problem with having vcpus running while > we're clearing the HPT? Stale TLB entries? If a vcpu is running, it could possibly call kvmppc_h_enter() to set a HPTE at the exact same time that we are clearing that HPTE, and we could end up with inconsistent results. I didn't want to have to add extra locking in kvmppc_h_enter() to make sure that couldn't happen, since kvmppc_h_enter() is quite a hot path. Instead I added a little bit of overhead on the vcpu entry/exit, since that happens much more rarely. In any case, if we're resetting the VM, there shouldn't be any vcpus running. Your comment has however reminded me that I need to arrange for a full TLB flush to occur the first time we run each vcpu after the reset. I'll do a new version that includes that. Paul. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html