Hi Michael, Any update on this patch ? Do I need to work again on it ? Thanks in advance, Johann On Thu, Aug 20, 2009 at 7:52 AM, Johann Baudy <johann.baudy@xxxxxxxxxxx> wrote: > Hi Mickael, > >> The patch looks useful. Could you tell me how you got the info? (It >> would help me try to verify it.) > - networking/packet_mmap.txt (in kernel doc) > - http://wiki.ipxwarzone.com/index.php5?title=Linux_packet_mmap (TX > only, I've made this patch) > >> Also, what kernel version number did these options appear in? > Normally next 2.6 > > PS: Sorry for slow reply, I was in vacation. > > Best regards, > Johann > > > On Fri, Jul 31, 2009 at 5:57 AM, Michael Kerrisk > <mtk.manpages@xxxxxxxxxxxxxx> wrote: >> >> Hi Johann. >> >> On Thu, Jul 30, 2009 at 1:04 AM, Johann Baudy<johann.baudy@xxxxxxxxxxx> wrote: >> > From: Johann Baudy <johann.baudy@xxxxxxxxxxx> >> > >> > Documentation of PACKET_RX_RING and PACKET_TX_RING socket options. >> > >> > Signed-off-by: Johann Baudy <johann.baudy@xxxxxxxxxxx> >> >> (Please CC me on patches. Otherwise I can easily miss them.) >> >> The patch looks useful. Could you tell me how you got the info? (It >> would help me try to verify it.) >> >> Also, what kernel version number did these options appear in? >> >> Thanks, >> >> Michael >> > -- >> > >> > man7/packet.7 | 212 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> > 1 files changed, 212 insertions(+), 0 deletions(-) >> > >> > diff --git a/man7/packet.7 b/man7/packet.7 >> > index 0b6c669..ec4973a 100644 >> > --- a/man7/packet.7 >> > +++ b/man7/packet.7 >> > @@ -222,6 +222,218 @@ In addition the traditional ioctls >> > .BR SIOCADDMULTI , >> > .B SIOCDELMULTI >> > can be used for the same purpose. >> > + >> > +Packet sockets can also be used to have a direct access to network device >> > +through configurable circular buffers mapped in user space. >> > +They can be used to either send or receive packets. >> > + >> > +.B PACKET_TX_RING >> > +enables and allocates a circular buffer for transmission process. >> > + >> > +.B PACKET_RX_RING >> > +enables and allocates a circular buffer for capture process. >> > + >> > +They both expect a >> > +.B packet_mreq >> > +structure as argument: >> > + >> > +.in +4n >> > +.nf >> > +struct tpacket_req { >> > + unsigned int tp_block_size; /* Minimal size of contiguous block */ >> > + unsigned int tp_block_nr; /* Number of blocks */ >> > + unsigned int tp_frame_size; /* Size of frame */ >> > + unsigned int tp_frame_nr; /* Total number of frames */ >> > +}; >> > +.fi >> > +.in >> > + >> > +This structure establishes a circular buffer of unswappable memory. >> > +Being mapped in the capture process allows reading the captured frames and >> > +related meta-information like timestamps without requiring a system call. >> > +Being mapped in the transmission process allows writing multiple packets that will be sent during >> > +.BR send (2). >> > +By using a shared buffer between the kernel and the user space also has >> > +the benefit of minimizing packet copies. >> > + >> > +Frames are grouped in blocks. Each block is a physically contiguous >> > +region of memory and holds >> > +.B tp_block_size >> > +/ >> > +.B tp_frame_size >> > +frames. >> > + >> > +The total number of blocks is >> > +.B tp_block_nr. >> > +Note that >> > +.B tp_frame_nr >> > +is a redundant parameter because >> > + >> > +.in +4n >> > +frames_per_block = tp_block_size/tp_frame_size >> > +.in >> > + >> > +Indeed, packet_set_ring checks that the following condition is true >> > + >> > +.in +4n >> > +frames_per_block * tp_block_nr == tp_frame_nr >> > +.in >> > + >> > +A frame can be of any size with the only condition it can fit in a block. A block >> > +can only hold an integer number of frames, or in other words, a frame cannot >> > +be spawned across two blocks. Please refer to >> > +.I networking/packet_mmap.txt >> > +in kernel documentation for more details. >> > + >> > +Each frame contains a header followed by data. >> > +Header is either a >> > +.B struct tpacket_hdr >> > +or >> > +.B struct tpacket2_hdr >> > +according to socket option >> > +.B PACKET_VERSION >> > +(which can be set to >> > +.B TPACKET_V1 >> > +or >> > +.B TPACKET_V2 >> > +respectively through >> > +.BR setsockopt(2) >> > +). >> > + >> > +With >> > +.B TPACKET_V1: >> > + >> > +.in +4n >> > +.nf >> > +struct tpacket_hdr >> > +{ >> > + unsigned long tp_status; >> > + unsigned int tp_len; >> > + unsigned int tp_snaplen; >> > + unsigned short tp_mac; >> > + unsigned short tp_net; >> > + unsigned int tp_sec; >> > + unsigned int tp_usec; >> > +}; >> > +.fi >> > +.in >> > + >> > +With >> > +.B TPACKET_V2: >> > + >> > +.in +4n >> > +.nf >> > +struct tpacket2_hdr >> > +{ >> > + __u32 tp_status; >> > + __u32 tp_len; >> > + __u32 tp_snaplen; >> > + __u16 tp_mac; >> > + __u16 tp_net; >> > + __u32 tp_sec; >> > + __u32 tp_nsec; >> > + __u16 tp_vlan_tci; >> > +}; >> > +.fi >> > +.in >> > + >> > +.B tp_len >> > +is the size of data received from network. >> > + >> > +.B tp_snaplen >> > +is the size of data that follows the header. >> > + >> > +.B tp_mac >> > +is the mac address offset ( >> > +.B PACKET_RX_RING >> > +only). >> > + >> > +.B tp_net >> > +is the network offset ( >> > +.B PACKET_RX_RING >> > +only). >> > + >> > +.B tp_sec >> > +, >> > +.B tp_usec >> > +is the timestamp of received packet ( >> > +.B PACKET_RX_RING >> > +only). >> > + >> > +.B tp_status >> > +is the status of current frame. >> > + >> > +For >> > +.B PACKET_TX_RING , >> > +status can be >> > +.B TP_STATUS_AVAILABLE >> > +if the frame is available for new packet transmission; >> > +.B TP_STATUS_SEND_REQUEST >> > +if the frame is filled by user for transmission; >> > +.B TP_STATUS_SENDING >> > +if the frame is currently in transmission within the kernel; >> > +.B TP_STATUS_WRONG_FORMAT >> > +if the frame format is not properly formatted (This status will only be used if socket option >> > +.B PACKET_LOSS >> > +is set to 1). >> > + >> > +For >> > +.B PACKET_RX_RING , >> > +a status equal to >> > +.B TP_STATUS_KERNEL >> > +indicates that the frame is available for kernel; >> > +.B TP_STATUS_USER >> > +indicates that kernel has received a packet (The frame is ready for user); >> > +.B TP_STATUS_COPY >> > +indicates that the frame (and associated meta information) >> > +has been truncated because it's larger than >> > +.B tp_frame_size >> > +; >> > +.B TP_STATUS_LOSING >> > +indicates there were packet drops from last time >> > +statistics where checked with >> > +.BR getsockopt(2) >> > +and the >> > +.B PACKET_STATISTICS >> > +option; >> > +.B TP_STATUS_CSUMNOTREADY >> > +is used for outgoing IP packets which it's checksum will be done in hardware. >> > + >> > +In order to use this shared memory, the user must call >> > +.BR mmap (2) >> > +function on packet socket. Then process depends on socket options: >> > + >> > +For >> > +.B PACKET_TX_RING , >> > +the kernel initializes all frames to >> > +.B TP_STATUS_AVAILABLE. >> > +To send a packet, the user fills a data buffer of an available frame, sets tp_len to >> > +current data buffer size and sets its status field to >> > +.B TP_STATUS_SEND_REQUEST. >> > +This can be done on multiple frames. Once the user is ready to transmit, it >> > +calls >> > +.BR send (2) . >> > +Then all buffers with status equal to >> > +.B TP_STATUS_SEND_REQUEST >> > +are forwarded to the network device. >> > +The kernel updates each status of sent frames with >> > +.B TP_STATUS_SENDING >> > +until the end of transfer. >> > +At the end of each transfer, buffer status returns to >> > +.B TP_STATUS_AVAILABLE. >> > + >> > +For >> > +.B PACKET_RX_RING , >> > +the kernel initializes all frames to >> > +.B TP_STATUS_KERNEL , >> > +when the kernel >> > +receives a packet it puts in the buffer and updates the status with >> > +at least the >> > +.B TP_STATUS_USER >> > +flag. Then the user can read the packet, >> > +once the packet is read the user must zero the status field, so the kernel >> > +can use again that frame buffer. >> > + >> > .SS Ioctls >> > .B SIOCGSTAMP >> > can be used to receive the timestamp of the last received packet. >> > >> > >> > -- >> > To unsubscribe from this list: send the line "unsubscribe linux-man" in >> > the body of a message to majordomo@xxxxxxxxxxxxxxx >> > More majordomo info at http://vger.kernel.org/majordomo-info.html >> > >> >> >> >> -- >> Michael Kerrisk >> Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ >> Watch my Linux system programming book progress to publication! >> http://blog.man7.org/ > -- To unsubscribe from this list: send the line "unsubscribe linux-man" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html