>>Playing with rlimit on data path, transparently to the application in this way >>looks strange to me, I suspect this has unexpected security implications. >>Further, applications may have other uses for locked memory >>besides mpassthru - you should not just take it because it's there. >> >>Can we have an ioctl that lets userspace configure how much >>memory to lock? This ioctl will decrement the rlimit and store >>the data in the device structure so we can do accounting >>internally. Put it back on close or on another ioctl. >Yes, we can decrement the rlimit in ioctl in one time to avoid >data path. > >>Need to be careful for when this operation gets called >>again with 0 or another small value while we have locked memory - >>maybe just fail with EBUSY? or wait until it gets unlocked? >>Maybe 0 can be special-cased and deactivate zero-copy?. >> How about we don't use a new ioctl, but just check the rlimit in one MPASSTHRU_BINDDEV ioctl? If we find mp device break the rlimit, then we fail the bind ioctl, and thus can't do zero copy any more. >In fact, if we choose RLIMIT_MEMLOCK to limit the lock memory, >the default value is only 16 pages. It's too small to make the device to >work. So we always to configure it with a large value. >I think, if rlimit value after decrement is < 0, then deactivate zero-copy >is better. 0 maybe ok. > >>> + >>> + if (ctor->lock_pages + count > lock_limit && npages) { >>> + printk(KERN_INFO "exceed the locked memory rlimit."); >>> + return NULL; >>> + } >>> + >>> + info = kmem_cache_zalloc(ext_page_info_cache, GFP_KERNEL); >> >>You seem to fill in all memory, why zalloc? this is data path ... > >Ok, Let me check this. > >> >>> + >>> + if (!info) >>> + return NULL; >>> + >>> + for (i = j = 0; i < count; i++) { >>> + base = (unsigned long)iov[i].iov_base; >>> + len = iov[i].iov_len; >>> + >>> + if (!len) >>> + continue; >>> + n = ((base & ~PAGE_MASK) + len + ~PAGE_MASK) >> PAGE_SHIFT; >>> + >>> + rc = get_user_pages_fast(base, n, npages ? 1 : 0, >> >>npages controls whether this is a write? Why? > >We use npages as a flag here. In mp_sendmsg(), we called alloc_page_info() with npages = >0. > >> >>> + &info->pages[j]); >>> + if (rc != n) >>> + goto failed; >>> + >>> + while (n--) { >>> + frags[j].offset = base & ~PAGE_MASK; >>> + frags[j].size = min_t(int, len, >>> + PAGE_SIZE - frags[j].offset); >>> + len -= frags[j].size; >>> + base += frags[j].size; >>> + j++; >>> + } >>> + } >>> + >>> +#ifdef CONFIG_HIGHMEM >>> + if (npages && !(dev->features & NETIF_F_HIGHDMA)) { >>> + for (i = 0; i < j; i++) { >>> + if (PageHighMem(info->pages[i])) >>> + goto failed; >>> + } >>> + } >>> +#endif >> >>Are non-highdma devices worth bothering with? If yes - >>are there other limitations devices might have that we need to handle? >>E.g. what about non-s/g devices or no checksum offloading?. > >Basically I think there is no limitations for both, but let me have a check. > >> >>> + skb_push(skb, ETH_HLEN); >>> + >>> + if (skb_is_gso(skb)) { >>> + hdr.hdr.hdr_len = skb_headlen(skb); >>> + hdr.hdr.gso_size = shinfo->gso_size; >>> + if (shinfo->gso_type & SKB_GSO_TCPV4) >>> + hdr.hdr.gso_type = VIRTIO_NET_HDR_GSO_TCPV4; >>> + else if (shinfo->gso_type & SKB_GSO_TCPV6) >>> + hdr.hdr.gso_type = VIRTIO_NET_HDR_GSO_TCPV6; >>> + else if (shinfo->gso_type & SKB_GSO_UDP) >>> + hdr.hdr.gso_type = VIRTIO_NET_HDR_GSO_UDP; >>> + else >>> + BUG(); >>> + if (shinfo->gso_type & SKB_GSO_TCP_ECN) >>> + hdr.hdr.gso_type |= VIRTIO_NET_HDR_GSO_ECN; >>> + >>> + } else >>> + hdr.hdr.gso_type = VIRTIO_NET_HDR_GSO_NONE; >>> + >>> + if (skb->ip_summed == CHECKSUM_PARTIAL) { >>> + hdr.hdr.flags = VIRTIO_NET_HDR_F_NEEDS_CSUM; >>> + hdr.hdr.csum_start = >>> + skb->csum_start - skb_headroom(skb); >>> + hdr.hdr.csum_offset = skb->csum_offset; >>> + } >> >>We have this code in tun, macvtap and packet socket already. >>Could this be a good time to move these into networking core? >>I'm not asking you to do this right now, but could this generic >>virtio-net to skb stuff be encapsulated in functions? > >It seems reasonable. > >> >>-- >>MST >-- >To unsubscribe from this list: send the line "unsubscribe netdev" in >the body of a message to majordomo@xxxxxxxxxxxxxxx >More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html