On Sun, Aug 09, 2009 at 08:42:24PM +0000, Arnd Bergmann wrote: > On Sunday 09 August 2009 08:02:16 Michael S. Tsirkin wrote: > > On Thu, Aug 06, 2009 at 09:50:28PM +0000, Arnd Bergmann wrote: > > > * The same framework in macvlan can be used to add a third backend > > > into a future kernel based virtio-net implementation. > > > > Could you split the patches up, to make this last easier? > > patch 1 - export framework > > patch 2 - code using it > > Sure, will do. > > > > +/* Get packet from user space buffer */ > > > +static ssize_t macvtap_get_user(struct macvtap_dev *vtap, > > > + const struct iovec *iv, size_t count, > > > + int noblock) > > > +{ > > > + struct sk_buff *skb; > > > + size_t len = count; > > > + > > > + if (unlikely(len < ETH_HLEN)) > > > + return -EINVAL; > > > + > > > + skb = alloc_skb(NET_IP_ALIGN + len, GFP_KERNEL); > > > + > > > + if (!skb) { > > > + vtap->m.dev->stats.rx_dropped++; > > > + return -ENOMEM; > > > + } > > > + > > > + skb_reserve(skb, NET_IP_ALIGN); > > > + skb_put(skb, count); > > > + > > > + if (skb_copy_datagram_from_iovec(skb, 0, iv, 0, len)) { > > > + vtap->m.dev->stats.rx_dropped++; > > > + kfree_skb(skb); > > > + return -EFAULT; > > > + } > > > + > > > + skb_set_network_header(skb, ETH_HLEN); > > > + skb->dev = vtap->m.lowerdev; > > > + > > > + macvlan_start_xmit(skb, vtap->m.dev); > > > + > > > + return count; > > > +} > > > > With tap, we discovered that not limiting the number of outstanding > > skbs hurts UDP performance. And the solution was to limit > > the number of outstanding packets - with hacks to work around > > the fact that userspace . > > Something seems to be missing in your last sentence here. Most userspace does not seem to implement software flow control for UDP, even though it probably should. > My driver OTOH is also missing any sort of flow control in both > RX and TX direction ;) For RX, there should probably just be > a limit of frames that get buffered in the ring. > > For TX, I guess there should be a way to let the packet > scheduler handle this and give us a chance to block and > unblock at the right time. I haven't found out yet how to > do that. > > Would it be enough to check the dev_queue_xmit() return > code for NETDEV_TX_BUSY? > > How would I get notified when it gets free again? You can do this by creating a socket. Look at how tun does this now. > > > + ret = skb_copy_datagram_iovec(skb, 0, iv, len); > > > + > > > + vtap->m.dev->stats.rx_packets++; > > > + vtap->m.dev->stats.rx_bytes += len; > > > > where does atomicity guarantee for these counters come from? > > AFAIK, we never do for any driver. They are statistics only and > need not be 100% correct, so the networking stack goes for > lower overhead and 99.9% correct. > > > > +static ssize_t macvtap_aio_read(struct kiocb *iocb, const struct iovec *iv, > > > + unsigned long count, loff_t pos) > > > +{ > > > + struct file *file = iocb->ki_filp; > > > + struct macvtap_dev *vtap = file->private_data; > > > + DECLARE_WAITQUEUE(wait, current); > > > + struct sk_buff *skb; > > > + ssize_t len, ret = 0; > > > + > > > + if (!vtap) > > > + return -EBADFD; > > > + > > > + len = iov_length(iv, count); > > > + if (len < 0) { > > > + ret = -EINVAL; > > > + goto out; > > > + } > > > + > > > + add_wait_queue(&vtap->wait, &wait); > > > + while (len) { > > > + current->state = TASK_INTERRUPTIBLE; > > > + > > > + /* Read frames from the queue */ > > > + if (!(skb=skb_dequeue(&vtap->readq))) { > > > + if (file->f_flags & O_NONBLOCK) { > > > + ret = -EAGAIN; > > > + break; > > > + } > > > + if (signal_pending(current)) { > > > + ret = -ERESTARTSYS; > > > + break; > > > + } > > > + /* Nothing to read, let's sleep */ > > > + schedule(); > > > + continue; > > > + } > > > + ret = macvtap_put_user(vtap, skb, (struct iovec *) iv, len); > > > > Don't cast away the constness. Instead, fix macvtap_put_user > > to used skb_copy_datagram_const_iovec which does not modify the iovec. > > Ah, good catch. I had copied that from the tun driver before you > fixed it there and failed to fix it the right way when I adapted > it for the new interface. > > Thanks for the review, > > Arnd <>< _______________________________________________ Bridge mailing list Bridge@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linux-foundation.org/mailman/listinfo/bridge