Hi Johann. On Thu, Jul 30, 2009 at 1:04 AM, Johann Baudy<johann.baudy@xxxxxxxxxxx> wrote: > From: Johann Baudy <johann.baudy@xxxxxxxxxxx> > > Documentation of PACKET_RX_RING and PACKET_TX_RING socket options. > > Signed-off-by: Johann Baudy <johann.baudy@xxxxxxxxxxx> (Please CC me on patches. Otherwise I can easily miss them.) The patch looks useful. Could you tell me how you got the info? (It would help me try to verify it.) Also, what kernel version number did these options appear in? Thanks, Michael > -- > > man7/packet.7 | 212 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > 1 files changed, 212 insertions(+), 0 deletions(-) > > diff --git a/man7/packet.7 b/man7/packet.7 > index 0b6c669..ec4973a 100644 > --- a/man7/packet.7 > +++ b/man7/packet.7 > @@ -222,6 +222,218 @@ In addition the traditional ioctls > .BR SIOCADDMULTI , > .B SIOCDELMULTI > can be used for the same purpose. > + > +Packet sockets can also be used to have a direct access to network device > +through configurable circular buffers mapped in user space. > +They can be used to either send or receive packets. > + > +.B PACKET_TX_RING > +enables and allocates a circular buffer for transmission process. > + > +.B PACKET_RX_RING > +enables and allocates a circular buffer for capture process. > + > +They both expect a > +.B packet_mreq > +structure as argument: > + > +.in +4n > +.nf > +struct tpacket_req { > + unsigned int tp_block_size; /* Minimal size of contiguous block */ > + unsigned int tp_block_nr; /* Number of blocks */ > + unsigned int tp_frame_size; /* Size of frame */ > + unsigned int tp_frame_nr; /* Total number of frames */ > +}; > +.fi > +.in > + > +This structure establishes a circular buffer of unswappable memory. > +Being mapped in the capture process allows reading the captured frames and > +related meta-information like timestamps without requiring a system call. > +Being mapped in the transmission process allows writing multiple packets that will be sent during > +.BR send (2). > +By using a shared buffer between the kernel and the user space also has > +the benefit of minimizing packet copies. > + > +Frames are grouped in blocks. Each block is a physically contiguous > +region of memory and holds > +.B tp_block_size > +/ > +.B tp_frame_size > +frames. > + > +The total number of blocks is > +.B tp_block_nr. > +Note that > +.B tp_frame_nr > +is a redundant parameter because > + > +.in +4n > +frames_per_block = tp_block_size/tp_frame_size > +.in > + > +Indeed, packet_set_ring checks that the following condition is true > + > +.in +4n > +frames_per_block * tp_block_nr == tp_frame_nr > +.in > + > +A frame can be of any size with the only condition it can fit in a block. A block > +can only hold an integer number of frames, or in other words, a frame cannot > +be spawned across two blocks. Please refer to > +.I networking/packet_mmap.txt > +in kernel documentation for more details. > + > +Each frame contains a header followed by data. > +Header is either a > +.B struct tpacket_hdr > +or > +.B struct tpacket2_hdr > +according to socket option > +.B PACKET_VERSION > +(which can be set to > +.B TPACKET_V1 > +or > +.B TPACKET_V2 > +respectively through > +.BR setsockopt(2) > +). > + > +With > +.B TPACKET_V1: > + > +.in +4n > +.nf > +struct tpacket_hdr > +{ > + unsigned long tp_status; > + unsigned int tp_len; > + unsigned int tp_snaplen; > + unsigned short tp_mac; > + unsigned short tp_net; > + unsigned int tp_sec; > + unsigned int tp_usec; > +}; > +.fi > +.in > + > +With > +.B TPACKET_V2: > + > +.in +4n > +.nf > +struct tpacket2_hdr > +{ > + __u32 tp_status; > + __u32 tp_len; > + __u32 tp_snaplen; > + __u16 tp_mac; > + __u16 tp_net; > + __u32 tp_sec; > + __u32 tp_nsec; > + __u16 tp_vlan_tci; > +}; > +.fi > +.in > + > +.B tp_len > +is the size of data received from network. > + > +.B tp_snaplen > +is the size of data that follows the header. > + > +.B tp_mac > +is the mac address offset ( > +.B PACKET_RX_RING > +only). > + > +.B tp_net > +is the network offset ( > +.B PACKET_RX_RING > +only). > + > +.B tp_sec > +, > +.B tp_usec > +is the timestamp of received packet ( > +.B PACKET_RX_RING > +only). > + > +.B tp_status > +is the status of current frame. > + > +For > +.B PACKET_TX_RING , > +status can be > +.B TP_STATUS_AVAILABLE > +if the frame is available for new packet transmission; > +.B TP_STATUS_SEND_REQUEST > +if the frame is filled by user for transmission; > +.B TP_STATUS_SENDING > +if the frame is currently in transmission within the kernel; > +.B TP_STATUS_WRONG_FORMAT > +if the frame format is not properly formatted (This status will only be used if socket option > +.B PACKET_LOSS > +is set to 1). > + > +For > +.B PACKET_RX_RING , > +a status equal to > +.B TP_STATUS_KERNEL > +indicates that the frame is available for kernel; > +.B TP_STATUS_USER > +indicates that kernel has received a packet (The frame is ready for user); > +.B TP_STATUS_COPY > +indicates that the frame (and associated meta information) > +has been truncated because it's larger than > +.B tp_frame_size > +; > +.B TP_STATUS_LOSING > +indicates there were packet drops from last time > +statistics where checked with > +.BR getsockopt(2) > +and the > +.B PACKET_STATISTICS > +option; > +.B TP_STATUS_CSUMNOTREADY > +is used for outgoing IP packets which it's checksum will be done in hardware. > + > +In order to use this shared memory, the user must call > +.BR mmap (2) > +function on packet socket. Then process depends on socket options: > + > +For > +.B PACKET_TX_RING , > +the kernel initializes all frames to > +.B TP_STATUS_AVAILABLE. > +To send a packet, the user fills a data buffer of an available frame, sets tp_len to > +current data buffer size and sets its status field to > +.B TP_STATUS_SEND_REQUEST. > +This can be done on multiple frames. Once the user is ready to transmit, it > +calls > +.BR send (2) . > +Then all buffers with status equal to > +.B TP_STATUS_SEND_REQUEST > +are forwarded to the network device. > +The kernel updates each status of sent frames with > +.B TP_STATUS_SENDING > +until the end of transfer. > +At the end of each transfer, buffer status returns to > +.B TP_STATUS_AVAILABLE. > + > +For > +.B PACKET_RX_RING , > +the kernel initializes all frames to > +.B TP_STATUS_KERNEL , > +when the kernel > +receives a packet it puts in the buffer and updates the status with > +at least the > +.B TP_STATUS_USER > +flag. Then the user can read the packet, > +once the packet is read the user must zero the status field, so the kernel > +can use again that frame buffer. > + > .SS Ioctls > .B SIOCGSTAMP > can be used to receive the timestamp of the last received packet. > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-man" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Watch my Linux system programming book progress to publication! http://blog.man7.org/ -- To unsubscribe from this list: send the line "unsubscribe linux-man" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html