Re: Improving PACKET_{RX,TX}_RING documentation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thanks for the feedback, Daniel and Willem.

Executive summary: We need a good concept for distributing (preferably
mutually exclusive) information among packet(7) and packet_mmap.txt.

Willem de Bruijn schrieb:
>>     0. Perhaps a general writeup on how the RX/TX_RING works in Linux,
>>        it's layout, constraints etc. Btw, not sure if that's also
> 
> This would duplicate the contents of
> Documentation/networking/packet_mmap.txt? I would caution against
> having two sources of documentation that may become inconsistent over
> time. A detailed discussion could also become too long for a manual
> page: packet_mmap.txt is already 1067 lines (albeit about half in
> example code). If that document is confusing, a thorough edit of that
> would be very helpful, though.
> 
>>        included already, but the same mmap-technique exists also for
>>        netlink sockets.
> 
> See also Documentation/networking/netlink_mmap.txt . If the ring is a
> generic netlink feature (i.e., not specific to nfnetlink), then man 7
> netlink is the right place for user documentation (in as far as this
> is a user-oriented feature).

Some deduplication between netlink(7), packet(7), netlink_mmap.txt and
packet_mmap.txt is probably a good idea. However, this is much more than
I initially bargained for :)
I had a superficial look at the netlink documents, and most concepts
appear to be very much alike. The operational aspects make quite a large
exception, though, since the netlink header (usage) is a lot simpler
than tpacket_hdr including its different versions.

IMHO user-space API documentation should reside in the man page and not
Documentation/, but I'd like to heard Michael's opinion on that. Maybe
it's a good idea to have at least a basic description on packet(7) and
reserve packet_mmap.txt for the more advanced topics?

>>>>       1. Increase detail of PACKET_{RX,TX}_RING socket options, including
>>>>          description of struct tpacket_hdr and anything else required to
>>>>          operate the ring.
> 
> If expanding the man page, then moving mmap into a separate section
> sounds good to me. If a man page is more user documentation than
> kernel Documentation/ then perhaps start by discussing the pros and
> cons of mmapped rings over recv and to help users decide whether to
> use the mmapped ring, or for instance batch with recvmmsg().

Actually I wanted to maintain the structure of the man page and
describe everything inside the appropriate "Socket options" sections.
However, adding a reference to recvmmsg() is a good idea as well, which
would justify creating a new sections for "Advanced packet socket
techniques".

>>>>       2. Move some details from other sockopts (e.g. PACKET_LOSS) into
>>>>          *_RING.
> 
> Yes, please move all ring-specific details into the new ring section.
> 
>>>>       3. Add fully functional example source code for simple
>>>>          PACKET_{RX,TX}_RING operation (initialization and operation).
>>>>          This may be as much as 3 different example programs if I
>>>>          incorporate [2] and [3] in an appropriate manner. It might be a
>>>>          good idea to add a non-*_RING example as well.
>>
>>
>> Yes, some examples for mmap RX, mmap TX, fanout, and perhaps TPACKET_V3
>> might be great.
>>
>>
>>>>       4. Add a warning about inferior _TX_RING performance [1] which I
>>>>          suffered from only recently in the measurements I made for my
>>>>          thesis on Linux 3.14.
> 
> I would describe such points in a positive manner (optimization) as
> opposed to a negative (inferior performance).

Using positive wording is always a good idea, but packet_mmap.txt
already tricked me into believing that PACKET_TX_RING should be faster
than plain sendto(). The user should be allowed to make an informed
decision, which requires the manpage to tell the (ugly) truth that
sendto() currently outperforms TX_RING.

> The optimization you refer to is to attach the tx-only packet socket
> to a protocol family that is never observed, so that no packets are
> looped back into the socket on receive. This is a great trick. There
> are probably others. Again, I believe that such details belong more in
> packet_mmap.txt than in the man page. But that is just one opinion, so
> I'll gladly defer to Michael and others on that point.

Since the tx-only optimization socket(*, *, 0) is not TX_RING-specific,
this should be in the manpage (IMHO). As I said above, packet_mmap.txt
may be a decent spot for advanced and {RX,TX}_RING-specific techniques.

>> Can you elaborate? Jesper made recently a nice summary on using trafgen
>> which uses TX_RING internally:
>>
>>   http://netoptimizer.blogspot.ch/2014/04/trafgen-fast-packet-generator.html

For my thesis I implemented a programmable switch in user-space (for
better programmability and guaranteed API compatibility). Testing with
64-byte frames I reached a maximum frame rate of ~0.6 Mpps using
{RX,TX}_RING, but ~0.87 Mpps using plain recvfrom()/sendto(). A hybrid
RX_RING/sendto() approach with seperate RX/TX threads yielded the same
frame rate as tests with pktgen (~0.89 Mpps). Fun fact: Open vSwitch
reached ~1 Mpps and thus surprisingly surpassed pktgen. This might be
worth investigating.

>> Absolutely, perhaps explaining differences from TPACKET_V1 -> V3 API and the
>> like.
> 
> That would be very interesting. The packet -> block batching mechanism
> likely was tested with small packet performance, but may have little
> benefit for larger packets. A discussion of the trade offs from a user
> point of view would be very interesting.

Actually I intended to deal only with TPACKET_V2 for now, since it is
simpler than TPACKET_V3 and can be use for RX and TX. TPACKET_V3 can be
added later on or could remain in packet_mmap.txt.


Cheers,
Carsten

--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Kernel Documentation]     [Netdev]     [Linux Ethernet Bridging]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux