On Tue, 2010-09-14 at 11:12 +0200, Avi Kivity wrote: > >> + base = (unsigned long)from->iov_base + offset1; > >> + size = ((base& ~PAGE_MASK) + len + ~PAGE_MASK)>> > PAGE_SHIFT; > >> + num_pages = get_user_pages_fast(base, size, > 0,&page[i]); > >> + if ((num_pages != size) || > >> + (num_pages> MAX_SKB_FRAGS - > skb_shinfo(skb)->nr_frags)) > >> + /* put_page is in skb free */ > >> + return -EFAULT; > > What keeps the user from writing to these pages in it's address > space > > after the write call returns? > > > > A write() return of success means: > > > > "I wrote what you gave to me" > > > > not > > > > "I wrote what you gave to me, oh and BTW don't touch these > > pages for a while." > > > > In fact "a while" isn't even defined in any way, as there is no way > > for the write() invoker to know when the networking card is done > with > > those pages. > > That's what io_submit() is for. Then io_getevents() tells you what > "a > while" actually was. This macvtap zero copy uses iov buffers from vhost ring, which is allocated from guest kernel. In host kernel, vhost calls macvtap sendmsg. macvtap sendmsg calls get_user_pages_fast to pin these buffers' pages for zero copy. The patch is relying on how vhost handle these buffers. I need to look at vhost code (qemu) first for addressing the questions here. Thanks Shirley -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html