On Fri, 31 May 2013 01:24:43 +0900 Takuya Yoshikawa <takuya.yoshikawa@xxxxxxxxx> wrote: > On Thu, 30 May 2013 03:53:38 +0300 > Gleb Natapov <gleb@xxxxxxxxxx> wrote: > > > On Wed, May 29, 2013 at 09:19:41PM +0800, Xiao Guangrong wrote: > > > On 05/29/2013 08:39 PM, Marcelo Tosatti wrote: > > > > On Wed, May 29, 2013 at 11:03:19AM +0800, Xiao Guangrong wrote: > > > >>>>> the pages since other vcpus may be doing locklessly shadow > > > >>>>> page walking > > > >>> > > > >>> Ah, yes, i agree with you. > > > >>> > > > >>> We can introduce a list, say kvm->arch.obsolte_pages, to link all of the > > > >>> zapped-page, the page-shrink will free the page on that list first. > > > >>> > > > >>> Marcelo, if you do not have objection on patch 1 ~ 8 and 11, could you please > > > >>> let them merged first, and do add some comments and tlb optimization later? > > > >> > > > >> Exclude patch 11 please, since it depends on the "collapse" optimization. > > > > > > > > I'm fine with patch 1 being merged. I think the remaining patches need better > > > > understanding or explanation. The problems i see are: > > > > > > > > 1) The magic number "10" to zap before considering reschedule is > > > > annoying. It would be good to understand why it is needed at all. > > > > > > ...... > > > > > > > > > > > But then again, the testcase is measuring kvm_mmu_zap_all performance > > > > alone which we know is not a common operation, so perhaps there is > > > > no need for that minimum-pages-to-zap-before-reschedule. > > > > > > Well. Although, this is not the common operation, but this operation > > > can be triggered by VCPU - it one VCPU take long time on zap-all-pages, > > > other vcpus is missing IPI-synce, or missing IO. This is easily cause > > > soft lockups if the vcpu is doing memslot-releated things. > > > > > +1. If it is trigarable by a guest it may slow down the guest, but we > > should not allow for it to slow down a host. > > > > Well, I don't object to the minimum-pages-to-zap-before-reschedule idea > itself, but if you're going to take patch 4, please at least add a warning > in the changelog that the magic number "10" was selected without good enough > reasoning. > > "[ It improves kernel building 0.6% ~ 1% ]" alone will make it hard for > others to change the number later. > > I actually once tried to do a similar thing for other code. So I have a > possible reasoning for this, and 10 should probably be changed later. > In this case, the solution seems to be very simple: just drop spin_needbreak() and leave need_resched() alone. This way we can guarantee that zap-all will get a fair amount of CPU time for each scheduling from the host scheduler's point of view. Of course this can block other VCPU threads waiting for mmu_lock during that time slice, but should be much better than blocking them for some magical number of zappings. We also need to remember that spin_needbreak() does not do anything for some preempt config settings. Takuya -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html