On 04/10/2012 03:46 AM, Marcelo Tosatti wrote: > On Tue, Apr 10, 2012 at 02:26:27AM +0800, Xiao Guangrong wrote: >> On 04/10/2012 01:58 AM, Marcelo Tosatti wrote: >> >>> On Mon, Apr 09, 2012 at 04:12:46PM +0300, Avi Kivity wrote: >>>> On 03/29/2012 11:20 AM, Xiao Guangrong wrote: >>>>> * Idea >>>>> The present bit of page fault error code (EFEC.P) indicates whether the >>>>> page table is populated on all levels, if this bit is set, we can know >>>>> the page fault is caused by the page-protection bits (e.g. W/R bit) or >>>>> the reserved bits. >>>>> >>>>> In KVM, in most cases, all this kind of page fault (EFEC.P = 1) can be >>>>> simply fixed: the page fault caused by reserved bit >>>>> (EFFC.P = 1 && EFEC.RSV = 1) has already been filtered out in fast mmio >>>>> path. What we need do to fix the rest page fault (EFEC.P = 1 && RSV != 1) >>>>> is just increasing the corresponding access on the spte. >>>>> >>>>> This pachset introduces a fast path to fix this kind of page fault: it >>>>> is out of mmu-lock and need not walk host page table to get the mapping >>>>> from gfn to pfn. >>>>> >>>>> >>>> >>>> This patchset is really worrying to me. >>>> >>>> It introduces a lot of concurrency into data structures that were not >>>> designed for it. Even if it is correct, it will be very hard to >>>> convince ourselves that it is correct, and if it isn't, to debug those >>>> subtle bugs. It will also be much harder to maintain the mmu code than >>>> it is now. >>>> >>>> There are a lot of things to check. Just as an example, we need to be >>>> sure that if we use rcu_dereference() twice in the same code path, that >>>> any inconsistencies due to a write in between are benign. Doing that is >>>> a huge task. >>>> >>>> But I appreciate the performance improvement and would like to see a >>>> simpler version make it in. This needs to reduce the amount of data >>>> touched in the fast path so it is easier to validate, and perhaps reduce >>>> the number of cases that the fast path works on. >>>> >>>> I would like to see the fast path as simple as >>>> >>>> rcu_read_lock(); >>>> >>>> (lockless shadow walk) >>>> spte = ACCESS_ONCE(*sptep); >>>> >>>> if (!(spte & PT_MAY_ALLOW_WRITES)) >>>> goto slow; >>>> >>>> gfn = kvm_mmu_page_get_gfn(sp, sptep - sp->sptes) >>>> mark_page_dirty(kvm, gfn); >>>> >>>> new_spte = spte & ~(PT64_MAY_ALLOW_WRITES | PT_WRITABLE_MASK); >>>> if (cmpxchg(sptep, spte, new_spte) != spte) >>>> goto slow; >>>> >>>> rcu_read_unlock(); >>>> return; >>>> >>>> slow: >>>> rcu_read_unlock(); >>>> slow_path(); >>>> >>>> It now becomes the responsibility of the slow path to maintain *sptep & >>>> PT_MAY_ALLOW_WRITES, but that path has a simpler concurrency model. It >>>> can be as simple as a clear_bit() before we update sp->gfns[] or if we >>>> add host write protection. >>>> >>>> Sorry, it's too complicated for me. Marcelo, what's your take? >>> >>> The improvement is small and limited to special cases (migration should >>> be rare and framebuffer memory accounts for a small percentage of total >>> memory). >> >> >> Actually, although the framebuffer is small but it is modified really >> frequently, and another unlucky things is that dirty-log is also >> very frequently and need hold mmu-lock to do write-protect. >> >> Yes, if Xwindow is not enabled, the benefit is limited. :) > > Ignoring that fact, the safety of lockless set_spte and friends is not > clear. > That is why AVI suggested me to simplify the whole things. :) > Perhaps the mmu_lock hold times by get_dirty are a large component here? > If that can be alleviated, not only RO->RW faults benefit. > Yes. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html