Hello Dmitry, Pavel, Ping! On 05/18/2016 10:37 PM, Michael Kerrisk (man-pages) wrote: > Hello Dmitry, Pavel, > > Ping! > > Cheers, > > Michael > > > > On 04/04/2016 10:34 AM, Michael Kerrisk (man-pages) wrote: >> Hello Dmitry >> >> Thanks for taking the time to work on this! >> >> I will probably have more comments, for a future draft, but here are >> a few initial comments. Could you take a look and send a new draft? >> >> On 03/16/2016 06:25 AM, Dmitry V. Levin wrote: >>> From: Pavel Emelyanov <xemul@xxxxxxxxxxxxx> >>> >>> Cowritten-by: Dmitry V. Levin <ldv@xxxxxxxxxxxx> >>> Signed-off-by: Pavel Emelyanov <xemul@xxxxxxxxxxxxx> >>> Signed-off-by: Dmitry V. Levin <ldv@xxxxxxxxxxxx> >>> --- >>> man7/sock_diag.7 | 632 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>> 1 file changed, 632 insertions(+) >>> create mode 100644 man7/sock_diag.7 >>> >>> diff --git a/man7/sock_diag.7 b/man7/sock_diag.7 >>> new file mode 100644 >>> index 0000000..d1be9cf >>> --- /dev/null >>> +++ b/man7/sock_diag.7 >>> @@ -0,0 +1,632 @@ >>> +.\" Copyright (c) 2016 Pavel Emelyanov <xemul@xxxxxxxxxxxxx> >>> +.\" Copyright (c) 2016 Dmitry V. Levin <ldv@xxxxxxxxxxxx> >>> +.\" >>> +.\" %%%LICENSE_START(GPLv2+_DOC_FULL) >>> +.\" This is free documentation; you can redistribute it and/or >>> +.\" modify it under the terms of the GNU General Public License as >>> +.\" published by the Free Software Foundation; either version 2 of >>> +.\" the License, or (at your option) any later version. >>> +.\" >>> +.\" The GNU General Public License's references to "object code" >>> +.\" and "executables" are to be interpreted as the output of any >>> +.\" document formatting or typesetting system, including >>> +.\" intermediate and printed output. >>> +.\" >>> +.\" This manual is distributed in the hope that it will be useful, >>> +.\" but WITHOUT ANY WARRANTY; without even the implied warranty of >>> +.\" MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the >>> +.\" GNU General Public License for more details. >>> +.\" >>> +.\" You should have received a copy of the GNU General Public >>> +.\" License along with this manual; if not, see >>> +.\" <http://www.gnu.org/licenses/>. >>> +.\" %%%LICENSE_END >>> +.TH SOCK_DIAG 7 2016-03-14 "Linux" "Linux Programmer's Manual" >>> +.SH NAME >>> +sock_diag \- querying information about sockets >>> +.SH SYNOPSIS >>> +.nf >>> +.B #include <sys/socket.h> >>> +.B #include <linux/sock_diag.h> >>> +.BR "#include <linux/unix_diag.h>" " /* for UNIX domain sockets */" >>> +.BR "#include <linux/inet_diag.h>" " /* for IPv4 and IPv6 sockets */" >>> + >>> +.BI "diag_socket = socket(AF_NETLINK, " socket_type ", NETLINK_SOCK_DIAG);" >>> +.fi >>> +.SH DESCRIPTION >>> +The sock_diag netlink subsystem provides a mechanism for querying information >>> +about sockets of various protocol families from the kernel. This subsystem >>> +can be used to query information about individual sockets or request a list of >>> +those. >> >> Sorry, it's not clear what "those" refers to. Could you replace with a noun? >> >>> + >>> +In the request the caller can specify the additional information it would >>> +like to query about the socket, e.g. memory info or family-specific stuff. >> >> s/query/obtain/ >> >>> + >>> +When requesting a list of sockets, the caller can specify filters >>> +that would be applied by the kernel to select a subset of sockets to report. >>> +For now there's only the ability to filter sockets by state (connected, >>> +listening, etc.) >>> + >>> +Note that sock_diag reports only those sockets that have a name, >>> +i.e. either bound explicitly with >>> +.BR bind (2) >>> +or auto-bound ones (e.g. connected). This is the same set of sockets that >>> +is available via >>> +.IR /proc/net/unix , >>> +.IR /proc/net/tcp , >>> +.IR /proc/net/udp , >>> +etc. >>> + >>> +.SS Request >>> +The request starts with >>> +.I "struct nlmsghdr" >>> +header that has >>> +.I nlmsg_type >>> +field set to >>> +.BR SOCK_DIAG_BY_FAMILY . >> >> I think it might be useful to either show the nlmsghdr structure here, or >> add a note to tell the reader that this structure defintion is hown in >> netlink(7). >> >>> +It is followed by a protocol family specific header that starts with a common >>> +part shared by all protocol families: >>> + >>> +.in +4n >>> +.nf >>> +struct sock_diag_req { >>> + __u8 sdiag_family; >>> + __u8 sdiag_protocol; >>> +}; >>> +.fi >>> +.in >>> +.PP >>> +The fields of this structure are as follows: >>> +.TP >>> +.I sdiag_family >>> +The protocol family of querying sockets. It should be set to the appropriate >> >> The meaning of "querying sockets" is unclear. Can you reword/elaborate? >> >>> +.B PF_* >> >> AF_* constant, I would say. POSIX doesn't talk about PF*, and in practice >> there's always a one to one relationship between AF_* and PF_*, so all other >> man pages use AF_*. >> >>> +constant. >>> +.TP >>> +.I sdiag_protocol >>> +Depends on >>> +.IR sdiag_family . >>> +It should be set to the appropriate >>> +.B IPPROTO_* >>> +constant for >>> +.B PF_INET >> >> AF_... >> >>> +and >>> +.BR PF_INET6, >> >> AF_... >> >>> +and to 0 otherwise. >>> +.PP >>> +If >>> +.I nlmsg_flags >>> +field of the >>> +.I "struct nlmsghdr" >>> +header has >>> +.BR NLM_F_DUMP >>> +flag set, then a list of sockets is being requested, >>> +otherwise it is a query about an individual socket. >>> + >>> +.SS Response >>> +The response starts with >>> +.I "struct nlmsghdr" >>> +header and is followed by an array of family-specific objects. >>> +The array is to be accessed with the standard >>> +.B NLMSG_* >>> +macros from >>> +.BR netlink (3) >>> +API. >>> +.PP >>> +Each object is the NLA (netlink attributes) list that is to be accessed >>> +with the >>> +.B RTA_* >>> +macros from >>> +.BR rtnetlink (3) >>> +API. >>> + >>> +.SS UNIX domain sockets >>> +For UNIX domain sockets the request is represented in the following structure: >>> + >>> +.in +4n >>> +.nf >>> +struct unix_diag_req { >>> + __u8 sdiag_family; >>> + __u8 sdiag_protocol; >>> + __u16 pad; >>> + __u32 udiag_states; >>> + __u32 udiag_ino; >>> + __u32 udiag_show; >>> + __u32 udiag_cookie[2]; >>> +}; >>> +.fi >>> +.in >>> +.PP >>> +The fields of this structure are as follows: >>> +.TP >>> +.I sdiag_family >>> +This is a protocol family, it should be set to >>> +.BR PF_UNIX . >>> +.PP >>> +.I sdiag_protocol >>> +.PD 0 >>> +.TP >>> +.PD >>> +.I pad >>> +These fields should be set to 0. >>> +.TP >>> +.I udiag_states >>> +This is a bit mask that defines a filter of sockets states. >>> +Only those sockets whose states are in this mask will be reported. >>> +Ignored when querying for an individual socket. >>> +Supported values are: >>> +.PD 0 >>> +.RS >>> +.IP "" 2 >>> +1 << >>> +.B TCP_ESTABLISHED >>> +.IP >>> +1 << >>> +.B TCP_LISTEN >>> +.RE >>> +.PD >>> +.TP >>> +.I udiag_ino >>> +This is an inode number when querying for an individual socket. >>> +Ignored for a bulk dump. >> >> The meaning of "bulk dump" is not clear. >> >>> +.TP >>> +.I udiag_show >>> +This is a set of flags defining which information to report. >>> +Each requested info is reported back as a netlink attribute as described >>> +below: >>> +.RS >>> +.IP "" 2 >>> +.B UDIAG_SHOW_NAME >>> +.RS 4 >>> +The attribute reported in answer to this request is >>> +.BR UNIX_DIAG_NAME . >>> +The payload associated with this attribute is the name of the socket >>> +to which it was bound (a sequence of bytes up to >>> +.B UNIX_PATH_MAX >>> +length). >>> +.RE >>> +.IP "" 2 >>> +.B UDIAG_SHOW_VFS >>> +.RS 4 >>> +The attribute reported in answer to this request is >>> +.BR UNIX_DIAG_VFS . >>> +The payload associated with this attribute is represented in the following >>> +structure: >>> + >>> +.in +4n >>> +.nf >>> +struct unix_diag_vfs { >>> + __u32 udiag_vfs_dev; >>> + __u32 udiag_vfs_ino; >>> +}; >>> +.fi >>> +.in >>> + >>> +The fields of this structure are as follows: >>> +.PD 0 >>> +.RS 2 >>> +.IP "" 2 >>> +.I udiag_vfs_dev >>> +The device number of the corresponding on-disk socket node. >>> +.IP >>> +.I udiag_vfs_ino >>> +The inode number of the corresponding on-disk socket node. >>> +.RE >>> +.PD >>> +.RE >>> +.IP >>> +.B UDIAG_SHOW_PEER >>> +.RS 4 >>> +The attribute reported in answer to this request is >>> +.BR UNIX_DIAG_PEER . >>> +The payload associated with this attribute is a __u32 value >>> +which is the peer's inode number. >>> +This attribute is reported for connected sockets only. >>> +.RE >>> +.IP >>> +.B UDIAG_SHOW_ICONS >>> +.RS 4 >>> +The attribute reported in answer to this request is >>> +.BR UNIX_DIAG_ICONS . >>> +The payload associated with this attribute is an array of __u32 values >>> +which are inode numbers of sockets that has passed the >>> +.BR connect (2) >>> +call, but hasn't been processed with >>> +.BR accept (2) >>> +yet. This attribute is reported for listening sockets only. >>> +.RE >>> +.IP >>> +.B UDIAG_SHOW_RQLEN >>> +.RS 4 >>> +The attribute reported in answer to this request is >>> +.BR UNIX_DIAG_RQLEN . >>> +The payload associated with this attribute is represented in the following >>> +structure: >>> + >>> +.in +4n >>> +.nf >>> +struct unix_diag_rqlen { >>> + __u32 udiag_rqueue; >>> + __u32 udiag_wqueue; >>> +}; >>> +.fi >>> +.in >>> + >>> +The fields of this structure are as follows: >>> +.PD 0 >>> +.RS 2 >>> +.IP "" 2 >>> +.I udiag_rqueue >>> +.RS 6 >>> +.IP "listening sockets:" 2 >>> +The number of pending connections which equals to >>> +.B UNIX_DIAG_ICONS >>> +array length. >>> +.IP "established sockets:" >>> +The amount of data in incoming queue. >>> +.RE >>> +.IP >>> +.I udiag_wqueue >>> +.RS 6 >>> +.IP "listening sockets:" 2 >>> +The backlog length which equals to the value passed as the second argument to >>> +.BR listen (2). >>> +.IP "established sockets:" >>> +The amount of memory available for sending. >>> +.RE >>> +.RE >>> +.PD >>> +.RE >>> +.IP >>> +.B UDIAG_SHOW_MEMINFO >>> +.RS 4 >>> +The attribute reported in answer to this request is >>> +.BR UNIX_DIAG_MEMINFO . >>> +The payload associated with this attribute is an array of __u32 values >>> +described below in "Socket memory information" subsection. >>> +.RE >>> +.IP >>> +.RE >>> +.RS >>> +The following attributes are reported back without any specific request: >>> +.IP "" 2 >>> +.BR UNIX_DIAG_SHUTDOWN . >>> +The payload associated with this attribute is __u8 value which represents >>> +bits of >>> +.BR shutdown (2) >>> +state. >>> +.RE >>> +.TP >>> +.I udiag_cookie >>> +This is service field, both its cells should be set to \-1. >> >> What does "service field" mean? I think this needs to be clarified. >> >>> +.PP >>> +The response to a query for UNIX domain sockets is represented as an array of >>> + >>> +.in +4n >>> +.nf >>> +struct unix_diag_msg { >>> + __u8 udiag_family; >>> + __u8 udiag_type; >>> + __u8 udiag_state; >>> + __u8 pad; >>> + __u32 udiag_ino; >>> + __u32 udiag_cookie[2]; >>> +}; >>> +.fi >>> +.in >>> + >>> +followed by netlink attributes. >>> +.PP >>> +The fields of this structure are as follows: >>> +.TP >>> +.I udiag_type >>> +This is set to one of the following constants: >>> +.PD 0 >>> +.RS >>> +.IP "" 2 >>> +.B SOCK_PACKET >>> +.IP >>> +.B SOCK_STREAM >>> +.IP >>> +.B SOCK_SEQPACKET >>> +.RE >>> +.PD >>> +.TP >>> +.I udiag_state >>> +This is set to one of the following constants: >>> +.PD 0 >>> +.RS >>> +.IP "" 2 >>> +.B TCP_LISTEN >>> +.IP >>> +.B TCP_ESTABLISHED >>> +.RE >>> +.PD >>> +.TP >>> +.I udiag_ino >>> +This is the socket inode number. >>> +.PP >>> +.I udiag_family >>> +.PD 0 >>> +.PP >>> +.I pad >>> +.TP >>> +.I udiag_cookie >>> +These fields have the same meaning as in >>> +.IR "struct unix_diag_req" . >>> +.PD >>> + >>> +.SS IPv4 and IPv6 sockets >>> +For IPv4 and IPv6 sockets the request is represented in the following structure: >>> + >>> +.in +4n >>> +.nf >>> +struct inet_diag_req_v2 { >>> + __u8 sdiag_family; >>> + __u8 sdiag_protocol; >>> + __u8 idiag_ext; >>> + __u8 pad; >>> + __u32 idiag_states; >>> + struct inet_diag_sockid id; >>> +}; >>> +.fi >>> +.in >>> + >>> +where >>> +.I "struct inet_diag_sockid" >>> +is defined as follows: >>> + >>> +.in +4n >>> +.nf >>> +struct inet_diag_sockid { >>> + __be16 idiag_sport; >>> + __be16 idiag_dport; >>> + __be32 idiag_src[4]; >>> + __be32 idiag_dst[4]; >>> + __u32 idiag_if; >>> + __u32 idiag_cookie[2]; >>> +}; >>> +.fi >>> +.in >>> +.PP >>> +The fields of >>> +.I "struct inet_diag_req_v2" >>> +are as follows: >>> +.TP >>> +.I sdiag_family >>> +This should be set to either >>> +.B PF_INET >>> +or >>> +.B PF_INET6 >>> +for >>> +.B IPv4 >>> +or >>> +.B IPv6 >>> +sockets respectively. >>> +.TP >>> +.I sdiag_protocol >>> +This should be set to one of the following constants: >>> +.PD 0 >>> +.RS >>> +.IP "" 2 >>> +.B IPPROTO_TCP >>> +.IP >>> +.B IPPROTO_UDP >>> +.IP >>> +.B IPPROTO_UDPLITE >>> +.RE >>> +.PD >>> +.TP >>> +.I idiag_ext >>> +This is a set of flags defining which extended information to report. >>> +Each requested info is reported back as a netlink attribute as described >>> +below: >>> +.RS >>> +.IP "" 2 >> >> Replace the last line with >> >> .TP >> >>> +.B INET_DIAG_TOS >>> +The payload associated with this attribute is a __u8 value >>> +which is the TOS of the socket. >>> +.IP >> >> .TP >> >>> +.B INET_DIAG_TCLASS >>> +The payload associated with this attribute is a __u8 value >>> +which is the TClass of the socket. IPv6 sockets only. >>> +For LISTEN and CLOSE sockets this is followed by >>> +.B INET_DIAG_SKV6ONLY >>> +attribute with associated __u8 payload value meaning whether the socket >>> +is IPv6-only or not. >>> +.IP >> >> .TP >> >>> +.B INET_DIAG_MEMINFO >>> +The payload associated with this attribute is represented in the following >>> +structure: >>> + >>> +.in +4n >>> +.nf >>> +struct inet_diag_meminfo { >>> + __u32 idiag_rmem; >>> + __u32 idiag_wmem; >>> + __u32 idiag_fmem; >>> + __u32 idiag_tmem; >>> +}; >>> +.fi >>> +.in >>> + >>> +The fields of this structure are as follows: >>> +.PD 0 >> >> Delete previous line. >> >>> +.RS 2 >> >> Make the last line >> >> .RS >> >>> +.IP "" 2 >> >> Change the last line to >> >> .TP 12 >> >>> +.I idiag_rmem >>> +The amount of data in the receive queue. >>> +.IP >> >> .TP >> >>> +.I idiag_wmem >>> +The amount of data that is queued by TCP but not yet sent. >>> +.IP >> >> .TP >> >>> +.I idiag_fmem >>> +The amount of memory scheduled for future use (TCP only). >>> +.IP >> >> .TP >> >>> +.I idiag_tmem >>> +The amount of data in send queue. >>> +.RE >>> +.PD >> >> Remove previous line >> >>> +.IP >> >> .TP >> >>> +.B INET_DIAG_SKMEMINFO >>> +The payload associated with this attribute is an array of __u32 values >>> +described below in "Socket memory information" subsection. >>> +.IP >> >> .TP >> >>> +.B INET_DIAG_INFO >>> +The payload associated with this attribute is protocol specific. >>> +For TCP sockets it is an object of type >>> +.IR "struct tcp_info" . >>> +.IP >> >> .TP >> >>> +.B INET_DIAG_CONG >>> +The payload associated with this attribute is a string that describes the >>> +congestion control algorithm used. For TCP sockets only. >>> +.RE >>> +.TP >>> +.I pad >>> +This should be set to 0. >>> +.TP >>> +.I idiag_states >>> +This is a bit mask that defines a filter of sockets states. >>> +Only those sockets whose states are in this mask will be reported. >>> +Ignored when querying for an individual socket. >>> +.TP >>> +.I id >>> +This is a socket id object that is used in dump requests, in queries >>> +about individual sockets, and is reported back in each response. >>> +Unlike UNIX domain sockets, IPv4 and IPv6 sockets are identified >>> +using addresses and ports. All values are in network byte order. >>> +.PP >>> +The fields of >>> +.I "struct inet_diag_sockid" >>> +are as follows: >>> +.TP >>> +.I idiag_sport >>> +The source port. >>> +.TP >>> +.I idiag_dport >>> +The destination port. >>> +.TP >>> +.I idiag_src >>> +The source address. >>> +.TP >>> +.I idiag_dst >>> +The destination address. >>> +.TP >>> +.I idiag_if >>> +The interface number the socket is bound to. >>> +.TP >>> +.I idiag_cookie >>> +This is a service field, both its cells should be set to \-1. >> >> Again, I think the meaning of "service" filed needs to be clarified. >> >>> +.PP >>> +The response to a query for IPv4 or IPv6 sockets is represented as an array of >>> + >>> +.in +4n >>> +.nf >>> +struct inet_diag_msg { >>> + __u8 idiag_family; >>> + __u8 idiag_state; >>> + __u8 idiag_timer; >>> + __u8 idiag_retrans; >>> + >>> + struct inet_diag_sockid id; >>> + >>> + __u32 idiag_expires; >>> + __u32 idiag_rqueue; >>> + __u32 idiag_wqueue; >>> + __u32 idiag_uid; >>> + __u32 idiag_inode; >>> +}; >>> +.fi >>> +.in >>> + >>> +followed by netlink attributes. >>> +.PP >>> +The fields of this structure are as follows: >>> +.TP >>> +.I idiag_family >>> +This is the same field as in >>> +.IR "struct inet_diag_req_v2" . >>> +.TP >>> +.I idiag_state >>> +This denotes socket state as in >>> +.IR "struct inet_diag_req_v2" . >>> +.PP >>> +.I idiag_timer >>> +.PD 0 >>> +.PP >>> +.I idiag_retrans >>> +.TP >>> +.I idiag_expires >>> +These fields are TCP-only and represent the timeout that is currently >>> +in action for particular TCP state (0 for established sockets). >> >> Can you elaborate with an example of a state that has a timeout. >> (Is this just TIME_WAIT, or others also?) >> >>> +.PD >>> +.TP >>> +.I idiag_rqueue >>> +.RS 7 >>> +.IP "listening sockets:" 2 >>> +The number of pending connections. >>> +.IP "other sockets:" >>> +The amount of data in incoming queue. >>> +.RE >>> +.TP >>> +.I idiag_wqueue >>> +.RS 7 >>> +.IP "listening sockets:" 2 >>> +The backlog length. >>> +.IP "other sockets:" >>> +The amount of memory available for sending. >>> +.RE >>> +.TP >>> +.I idiag_uid >>> +This is the socket owner UID. >>> +.TP >>> +.I idiag_inode >>> +This is the socket inode number. >>> + >>> +.SS Socket memory information >>> +The payload associated with >>> +.B UNIX_DIAG_MEMINFO >>> +and >>> +.BR INET_DIAG_SKMEMINFO >>> +netlink attributes is an array of the following __u32 values: >>> +.TP >>> +.B SK_MEMINFO_RMEM_ALLOC >>> +The amount of data in receive queue. >>> +.TP >>> +.B SK_MEMINFO_RCVBUF >>> +The receive socket buffer as set by >>> +.BR SO_RCVBUF . >>> +.TP >>> +.B SK_MEMINFO_WMEM_ALLOC >>> +The amount of data in send queue. >>> +.TP >>> +.B SK_MEMINFO_SNDBUF >>> +The send socket buffer as set by >>> +.BR SO_SNDBUF . >>> +.TP >>> +.B SK_MEMINFO_FWD_ALLOC >>> +The amount of memory scheduled for future use (TCP only). >>> +.TP >>> +.B SK_MEMINFO_WMEM_QUEUED >>> +The amount of data queued by TCP, but not yet sent. >>> +.TP >>> +.B SK_MEMINFO_OPTMEM >>> +The amount of memory allocated for socket's service needs (e.g. socket >>> +filter). >>> +.TP >>> +.B SK_MEMINFO_BACKLOG >>> +The amount of packets in the backlog (not yet processed). >>> +.SH CONFORMING TO >>> +The NETLINK_SOCK_DIAG API is Linux-specific. >>> +.SH VERSIONS >>> +.B NETLINK_SOCK_DIAG >>> +was introduced in Linux 3.3. >>> +.PP >>> +.B UNIX_DIAG_MEMINFO >>> +and >>> +.BR INET_DIAG_SKMEMINFO >>> +were introduced in Linux 3.6. >>> +.SH SEE ALSO >>> +.BR netlink (3), >>> +.BR rtnetlink (3), >>> +.BR netlink (7) >> >> In the next iteration, I think it would be simplest to just include >> the example program in the same patch. >> >> Thanks, >> >> Michael >> >> > > -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/ -- To unsubscribe from this list: send the line "unsubscribe linux-man" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html