On 04/21/2017 12:05 AM, Jason Wang wrote: > > > On 2017年04月20日 23:34, Vlad Yasevich wrote: >> On 04/17/2017 11:01 PM, Jason Wang wrote: >>> >>> On 2017年04月16日 00:38, Vladislav Yasevich wrote: >>>> Curreclty virtion net header is fixed size and adding things to it is rather >>>> difficult to do. This series attempt to add the infrastructure as well as some >>>> extensions that try to resolve some deficiencies we currently have. >>>> >>>> First, vnet header only has space for 16 flags. This may not be enough >>>> in the future. The extensions will provide space for 32 possbile extension >>>> flags and 32 possible extensions. These flags will be carried in the >>>> first pseudo extension header, the presense of which will be determined by >>>> the flag in the virtio net header. >>>> >>>> The extensions themselves will immidiately follow the extension header itself. >>>> They will be added to the packet in the same order as they appear in the >>>> extension flags. No padding is placed between the extensions and any >>>> extensions negotiated, but not used need by a given packet will convert to >>>> trailing padding. >>> Do we need a explicit padding (e.g an extension) which could be controlled by each side? >> I don't think so. The size of the vnet header is set based on the extensions negotiated. >> The one part I am not crazy about is that in the case of packet not using any extensions, >> the data is still placed after the entire vnet header, which essentially adds a lot >> of padding. However, that's really no different then if we simply grew the vnet header. >> >> The other thing I've tried before is putting extensions into their own sg buffer, but that >> made it slower.h > > Yes. > >> >>>> For example: >>>> | vnet mrg hdr | ext hdr | ext 1 | ext 2 | ext 5 | .. pad .. | packet data | >>> Just some rough thoughts: >>> >>> - Is this better to use TLV instead of bitmap here? One advantage of TLV is that the >>> length is not limited by the length of bitmap. >> but the disadvantage is that we add at least 4 bytes per extension of just TL data. That >> makes this thing even longer. > > Yes, and it looks like the length is still limited by e.g the length of T. Not only that, but it is also limited by the skb->cb as a whole. So adding putting extensions into a TLV style means we have less extensions for now, until we get rid of skb->cb usage. > >> >>> - For 1.1, do we really want something like vnet header? AFAIK, it was not used by modern >>> NICs, is this better to pack all meta-data into descriptor itself? This may need a some >>> changes in tun/macvtap, but looks more PCIE friendly. >> That would really be ideal and I've looked at this. There are small issues of exposing >> the 'net metadata' of the descriptor to taps so they can be filled in. The alternative >> is to use a different control structure for tap->qemu|vhost channel (that can be >> implementation specific) and have qemu|vhost populate the 'net metadata' of the descriptor. > > Yes, this needs some thought. For vhost, things looks a little bit easier, we can probably > use msg_control. > We can use msg_control in qemu as well, can't we? It really is a question of who is doing the work and the number of copies. I can take a closer look of how it would look if we extend the descriptor with type specific data. I don't know if other users of virtio would benefit from it? -vlad > Thanks > >> Thanks >> -vlad >> >>> Thanks >>> >>>> Extensions proposed in this series are: >>>> - IPv6 fragment id extension >>>> * Currently, the guest generated fragment id is discarded and the host >>>> generates an IPv6 fragment id if the packet has to be fragmented. The >>>> code attempts to add time based perturbation to id generation to make >>>> it harder to guess the next fragment id to be used. However, doing this >>>> on the host may result is less perturbation (due to differnet timing) >>>> and might make id guessing easier. Ideally, the ids generated by the >>>> guest should be used. One could also argue that we a "violating" the >>>> IPv6 protocol in the if the _strict_ interpretation of the spec. >>>> >>>> - VLAN header acceleration >>>> * Currently virtio doesn't not do vlan header acceleration and instead >>>> uses software tagging. One of the first things that the host will do is >>>> strip the vlan header out. When passing the packet the a guest the >>>> vlan header is re-inserted in to the packet. We can skip all that work >>>> if we can pass the vlan data in accelearted format. Then the host will >>>> not do any extra work. However, so far, this yeilded a very small >>>> perf bump (only ~1%). I am still looking into this. >>>> >>>> - UDP tunnel offload >>>> * Similar to vlan acceleration, with this extension we can pass additional >>>> data to host for support GSO with udp tunnel and possible other >>>> encapsulations. This yeilds a significant perfromance improvement >>>> (still testing remote checksum code). >>>> >>>> An addition extension that is unfinished (due to still testing for any >>>> side-effects) is checksum passthrough to support drivers that set >>>> CHECKSUM_COMPLETE. This would eliminate the need for guests to compute >>>> the software checksum. >>>> >>>> This series only takes care of virtio net. I have addition patches for the >>>> host side (vhost and tap/macvtap as well as qemu), but wanted to get feedback >>>> on the general approach first. >>>> >>>> Vladislav Yasevich (6): >>>> virtio-net: Remove the use the padded vnet_header structure >>>> virtio-net: make header length handling uniform >>>> virtio_net: Add basic skeleton for handling vnet header extensions. >>>> virtio-net: Add support for IPv6 fragment id vnet header extension. >>>> virtio-net: Add support for vlan acceleration vnet header extension. >>>> virtio-net: Add support for UDP tunnel offload and extension. >>>> >>>> drivers/net/virtio_net.c | 132 +++++++++++++++++++++++++++++++++------- >>>> include/linux/skbuff.h | 5 ++ >>>> include/linux/virtio_net.h | 91 ++++++++++++++++++++++++++- >>>> include/uapi/linux/virtio_net.h | 38 ++++++++++++ >>>> 4 files changed, 242 insertions(+), 24 deletions(-) >>>> > _______________________________________________ Virtualization mailing list Virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linuxfoundation.org/mailman/listinfo/virtualization