On Thu, 17 Nov 2011 12:27:41 +0200 Avi Kivity <avi@xxxxxxxxxx> wrote: > On 11/17/2011 12:00 PM, Carsten Otte wrote: > > From: Christian Borntraeger <borntraeger@xxxxxxxxxx> > > > > There is a potential host deadlock in the tprot intercept handling. > > We must not hold the mmap semaphore while resolving the guest > > address. If userspace is remapping, then the memory detection in > > the guest is broken anyway so we can safely separate the > > address translation from walking the vmas. > > > > Signed-off-by: Christian Borntraeger <borntraeger@xxxxxxxxxx> > > Signed-off-by: Carsten Otte <cotte@xxxxxxxxxx> > > --- > > > > arch/s390/kvm/priv.c | 10 ++++++++-- > > 1 file changed, 8 insertions(+), 2 deletions(-) > > > > diff -urpN linux-2.6/arch/s390/kvm/priv.c linux-2.6-patched/arch/s390/kvm/priv.c > > --- linux-2.6/arch/s390/kvm/priv.c 2011-10-24 09:10:05.000000000 +0200 > > +++ linux-2.6-patched/arch/s390/kvm/priv.c 2011-11-17 10:03:53.000000000 +0100 > > @@ -336,6 +336,7 @@ static int handle_tprot(struct kvm_vcpu > > u64 address1 = disp1 + base1 ? vcpu->arch.guest_gprs[base1] : 0; > > u64 address2 = disp2 + base2 ? vcpu->arch.guest_gprs[base2] : 0; > > struct vm_area_struct *vma; > > + unsigned long user_address; > > > > vcpu->stat.instruction_tprot++; > > > > @@ -349,9 +350,14 @@ static int handle_tprot(struct kvm_vcpu > > return -EOPNOTSUPP; > > > > > > + /* we must resolve the address without holding the mmap semaphore. > > + * This is ok since the userspace hypervisor is not supposed to change > > + * the mapping while the guest queries the memory. Otherwise the guest > > + * might crash or get wrong info anyway. */ > > + user_address = (unsigned long) __guestaddr_to_user(vcpu, address1); > > + > > down_read(¤t->mm->mmap_sem); > > - vma = find_vma(current->mm, > > - (unsigned long) __guestaddr_to_user(vcpu, address1)); > > + vma = find_vma(current->mm, user_address); > > if (!vma) { > > up_read(¤t->mm->mmap_sem); > > return kvm_s390_inject_program_int(vcpu, PGM_ADDRESSING); > > > > Unrelated to the patch, but I'm curious: it looks like __gmap_fault() > dereferences the guest page table? How can it assume that it is mapped? The gmap code does not assume that the code is mapped. If the individual MB has not been mapped in the guest address space or the target memory is not mapped in the process address space __gmap_fault() returns -EFAULT. > I'm probably misreading the code. > > A little closer to the patch, x86 handles the same issue by calling > get_user_pages_fast(). This should be more scalable than bouncing > mmap_sem, something to consider. I don't think that the frequency of asynchronous page faults will make it necessary to use get_user_pages_fast(). We are talking about the case where I/O is necessary to provide the page that the guest accessed. The advantage of the way s390 does things is that after __gmap_fault translated the guest address to a user space address we can just do a standard page fault for the user space process. Only if that requires I/O we go the long way. Makes sense? -- blue skies, Martin. "Reality continues to ruin my life." - Calvin. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html