So - does this driver help reduce service demand signifiantly? Some comments from looking at the code: On Fri, Aug 06, 2010 at 05:23:41PM +0800, xiaohui.xin@xxxxxxxxx wrote: > +static struct page_info *alloc_page_info(struct page_ctor *ctor, > + struct kiocb *iocb, struct iovec *iov, > + int count, struct frag *frags, > + int npages, int total) > +{ > + int rc; > + int i, j, n = 0; > + int len; > + unsigned long base, lock_limit; > + struct page_info *info = NULL; > + > + lock_limit = current->signal->rlim[RLIMIT_MEMLOCK].rlim_cur; > + lock_limit >>= PAGE_SHIFT; Playing with rlimit on data path, transparently to the application in this way looks strange to me, I suspect this has unexpected security implications. Further, applications may have other uses for locked memory besides mpassthru - you should not just take it because it's there. Can we have an ioctl that lets userspace configure how much memory to lock? This ioctl will decrement the rlimit and store the data in the device structure so we can do accounting internally. Put it back on close or on another ioctl. Need to be careful for when this operation gets called again with 0 or another small value while we have locked memory - maybe just fail with EBUSY? or wait until it gets unlocked? Maybe 0 can be special-cased and deactivate zero-copy?. > + > + if (ctor->lock_pages + count > lock_limit && npages) { > + printk(KERN_INFO "exceed the locked memory rlimit."); > + return NULL; > + } > + > + info = kmem_cache_zalloc(ext_page_info_cache, GFP_KERNEL); You seem to fill in all memory, why zalloc? this is data path ... > + > + if (!info) > + return NULL; > + > + for (i = j = 0; i < count; i++) { > + base = (unsigned long)iov[i].iov_base; > + len = iov[i].iov_len; > + > + if (!len) > + continue; > + n = ((base & ~PAGE_MASK) + len + ~PAGE_MASK) >> PAGE_SHIFT; > + > + rc = get_user_pages_fast(base, n, npages ? 1 : 0, npages controls whether this is a write? Why? > + &info->pages[j]); > + if (rc != n) > + goto failed; > + > + while (n--) { > + frags[j].offset = base & ~PAGE_MASK; > + frags[j].size = min_t(int, len, > + PAGE_SIZE - frags[j].offset); > + len -= frags[j].size; > + base += frags[j].size; > + j++; > + } > + } > + > +#ifdef CONFIG_HIGHMEM > + if (npages && !(dev->features & NETIF_F_HIGHDMA)) { > + for (i = 0; i < j; i++) { > + if (PageHighMem(info->pages[i])) > + goto failed; > + } > + } > +#endif Are non-highdma devices worth bothering with? If yes - are there other limitations devices might have that we need to handle? E.g. what about non-s/g devices or no checksum offloading? > + skb_push(skb, ETH_HLEN); > + > + if (skb_is_gso(skb)) { > + hdr.hdr.hdr_len = skb_headlen(skb); > + hdr.hdr.gso_size = shinfo->gso_size; > + if (shinfo->gso_type & SKB_GSO_TCPV4) > + hdr.hdr.gso_type = VIRTIO_NET_HDR_GSO_TCPV4; > + else if (shinfo->gso_type & SKB_GSO_TCPV6) > + hdr.hdr.gso_type = VIRTIO_NET_HDR_GSO_TCPV6; > + else if (shinfo->gso_type & SKB_GSO_UDP) > + hdr.hdr.gso_type = VIRTIO_NET_HDR_GSO_UDP; > + else > + BUG(); > + if (shinfo->gso_type & SKB_GSO_TCP_ECN) > + hdr.hdr.gso_type |= VIRTIO_NET_HDR_GSO_ECN; > + > + } else > + hdr.hdr.gso_type = VIRTIO_NET_HDR_GSO_NONE; > + > + if (skb->ip_summed == CHECKSUM_PARTIAL) { > + hdr.hdr.flags = VIRTIO_NET_HDR_F_NEEDS_CSUM; > + hdr.hdr.csum_start = > + skb->csum_start - skb_headroom(skb); > + hdr.hdr.csum_offset = skb->csum_offset; > + } We have this code in tun, macvtap and packet socket already. Could this be a good time to move these into networking core? I'm not asking you to do this right now, but could this generic virtio-net to skb stuff be encapsulated in functions? -- MST -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html