Hello Dmitry, Pavel, Ping! Cheers, Michael On 04/04/2016 10:34 AM, Michael Kerrisk (man-pages) wrote: > Hello Dmitry > > Thanks for taking the time to work on this! > > I will probably have more comments, for a future draft, but here are > a few initial comments. Could you take a look and send a new draft? > > On 03/16/2016 06:25 AM, Dmitry V. Levin wrote: >> From: Pavel Emelyanov <xemul@xxxxxxxxxxxxx> >> >> Cowritten-by: Dmitry V. Levin <ldv@xxxxxxxxxxxx> >> Signed-off-by: Pavel Emelyanov <xemul@xxxxxxxxxxxxx> >> Signed-off-by: Dmitry V. Levin <ldv@xxxxxxxxxxxx> >> --- >> man7/sock_diag.7 | 632 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> 1 file changed, 632 insertions(+) >> create mode 100644 man7/sock_diag.7 >> >> diff --git a/man7/sock_diag.7 b/man7/sock_diag.7 >> new file mode 100644 >> index 0000000..d1be9cf >> --- /dev/null >> +++ b/man7/sock_diag.7 >> @@ -0,0 +1,632 @@ >> +.\" Copyright (c) 2016 Pavel Emelyanov <xemul@xxxxxxxxxxxxx> >> +.\" Copyright (c) 2016 Dmitry V. Levin <ldv@xxxxxxxxxxxx> >> +.\" >> +.\" %%%LICENSE_START(GPLv2+_DOC_FULL) >> +.\" This is free documentation; you can redistribute it and/or >> +.\" modify it under the terms of the GNU General Public License as >> +.\" published by the Free Software Foundation; either version 2 of >> +.\" the License, or (at your option) any later version. >> +.\" >> +.\" The GNU General Public License's references to "object code" >> +.\" and "executables" are to be interpreted as the output of any >> +.\" document formatting or typesetting system, including >> +.\" intermediate and printed output. >> +.\" >> +.\" This manual is distributed in the hope that it will be useful, >> +.\" but WITHOUT ANY WARRANTY; without even the implied warranty of >> +.\" MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the >> +.\" GNU General Public License for more details. >> +.\" >> +.\" You should have received a copy of the GNU General Public >> +.\" License along with this manual; if not, see >> +.\" <http://www.gnu.org/licenses/>. >> +.\" %%%LICENSE_END >> +.TH SOCK_DIAG 7 2016-03-14 "Linux" "Linux Programmer's Manual" >> +.SH NAME >> +sock_diag \- querying information about sockets >> +.SH SYNOPSIS >> +.nf >> +.B #include <sys/socket.h> >> +.B #include <linux/sock_diag.h> >> +.BR "#include <linux/unix_diag.h>" " /* for UNIX domain sockets */" >> +.BR "#include <linux/inet_diag.h>" " /* for IPv4 and IPv6 sockets */" >> + >> +.BI "diag_socket = socket(AF_NETLINK, " socket_type ", NETLINK_SOCK_DIAG);" >> +.fi >> +.SH DESCRIPTION >> +The sock_diag netlink subsystem provides a mechanism for querying information >> +about sockets of various protocol families from the kernel. This subsystem >> +can be used to query information about individual sockets or request a list of >> +those. > > Sorry, it's not clear what "those" refers to. Could you replace with a noun? > >> + >> +In the request the caller can specify the additional information it would >> +like to query about the socket, e.g. memory info or family-specific stuff. > > s/query/obtain/ > >> + >> +When requesting a list of sockets, the caller can specify filters >> +that would be applied by the kernel to select a subset of sockets to report. >> +For now there's only the ability to filter sockets by state (connected, >> +listening, etc.) >> + >> +Note that sock_diag reports only those sockets that have a name, >> +i.e. either bound explicitly with >> +.BR bind (2) >> +or auto-bound ones (e.g. connected). This is the same set of sockets that >> +is available via >> +.IR /proc/net/unix , >> +.IR /proc/net/tcp , >> +.IR /proc/net/udp , >> +etc. >> + >> +.SS Request >> +The request starts with >> +.I "struct nlmsghdr" >> +header that has >> +.I nlmsg_type >> +field set to >> +.BR SOCK_DIAG_BY_FAMILY . > > I think it might be useful to either show the nlmsghdr structure here, or > add a note to tell the reader that this structure defintion is hown in > netlink(7). > >> +It is followed by a protocol family specific header that starts with a common >> +part shared by all protocol families: >> + >> +.in +4n >> +.nf >> +struct sock_diag_req { >> + __u8 sdiag_family; >> + __u8 sdiag_protocol; >> +}; >> +.fi >> +.in >> +.PP >> +The fields of this structure are as follows: >> +.TP >> +.I sdiag_family >> +The protocol family of querying sockets. It should be set to the appropriate > > The meaning of "querying sockets" is unclear. Can you reword/elaborate? > >> +.B PF_* > > AF_* constant, I would say. POSIX doesn't talk about PF*, and in practice > there's always a one to one relationship between AF_* and PF_*, so all other > man pages use AF_*. > >> +constant. >> +.TP >> +.I sdiag_protocol >> +Depends on >> +.IR sdiag_family . >> +It should be set to the appropriate >> +.B IPPROTO_* >> +constant for >> +.B PF_INET > > AF_... > >> +and >> +.BR PF_INET6, > > AF_... > >> +and to 0 otherwise. >> +.PP >> +If >> +.I nlmsg_flags >> +field of the >> +.I "struct nlmsghdr" >> +header has >> +.BR NLM_F_DUMP >> +flag set, then a list of sockets is being requested, >> +otherwise it is a query about an individual socket. >> + >> +.SS Response >> +The response starts with >> +.I "struct nlmsghdr" >> +header and is followed by an array of family-specific objects. >> +The array is to be accessed with the standard >> +.B NLMSG_* >> +macros from >> +.BR netlink (3) >> +API. >> +.PP >> +Each object is the NLA (netlink attributes) list that is to be accessed >> +with the >> +.B RTA_* >> +macros from >> +.BR rtnetlink (3) >> +API. >> + >> +.SS UNIX domain sockets >> +For UNIX domain sockets the request is represented in the following structure: >> + >> +.in +4n >> +.nf >> +struct unix_diag_req { >> + __u8 sdiag_family; >> + __u8 sdiag_protocol; >> + __u16 pad; >> + __u32 udiag_states; >> + __u32 udiag_ino; >> + __u32 udiag_show; >> + __u32 udiag_cookie[2]; >> +}; >> +.fi >> +.in >> +.PP >> +The fields of this structure are as follows: >> +.TP >> +.I sdiag_family >> +This is a protocol family, it should be set to >> +.BR PF_UNIX . >> +.PP >> +.I sdiag_protocol >> +.PD 0 >> +.TP >> +.PD >> +.I pad >> +These fields should be set to 0. >> +.TP >> +.I udiag_states >> +This is a bit mask that defines a filter of sockets states. >> +Only those sockets whose states are in this mask will be reported. >> +Ignored when querying for an individual socket. >> +Supported values are: >> +.PD 0 >> +.RS >> +.IP "" 2 >> +1 << >> +.B TCP_ESTABLISHED >> +.IP >> +1 << >> +.B TCP_LISTEN >> +.RE >> +.PD >> +.TP >> +.I udiag_ino >> +This is an inode number when querying for an individual socket. >> +Ignored for a bulk dump. > > The meaning of "bulk dump" is not clear. > >> +.TP >> +.I udiag_show >> +This is a set of flags defining which information to report. >> +Each requested info is reported back as a netlink attribute as described >> +below: >> +.RS >> +.IP "" 2 >> +.B UDIAG_SHOW_NAME >> +.RS 4 >> +The attribute reported in answer to this request is >> +.BR UNIX_DIAG_NAME . >> +The payload associated with this attribute is the name of the socket >> +to which it was bound (a sequence of bytes up to >> +.B UNIX_PATH_MAX >> +length). >> +.RE >> +.IP "" 2 >> +.B UDIAG_SHOW_VFS >> +.RS 4 >> +The attribute reported in answer to this request is >> +.BR UNIX_DIAG_VFS . >> +The payload associated with this attribute is represented in the following >> +structure: >> + >> +.in +4n >> +.nf >> +struct unix_diag_vfs { >> + __u32 udiag_vfs_dev; >> + __u32 udiag_vfs_ino; >> +}; >> +.fi >> +.in >> + >> +The fields of this structure are as follows: >> +.PD 0 >> +.RS 2 >> +.IP "" 2 >> +.I udiag_vfs_dev >> +The device number of the corresponding on-disk socket node. >> +.IP >> +.I udiag_vfs_ino >> +The inode number of the corresponding on-disk socket node. >> +.RE >> +.PD >> +.RE >> +.IP >> +.B UDIAG_SHOW_PEER >> +.RS 4 >> +The attribute reported in answer to this request is >> +.BR UNIX_DIAG_PEER . >> +The payload associated with this attribute is a __u32 value >> +which is the peer's inode number. >> +This attribute is reported for connected sockets only. >> +.RE >> +.IP >> +.B UDIAG_SHOW_ICONS >> +.RS 4 >> +The attribute reported in answer to this request is >> +.BR UNIX_DIAG_ICONS . >> +The payload associated with this attribute is an array of __u32 values >> +which are inode numbers of sockets that has passed the >> +.BR connect (2) >> +call, but hasn't been processed with >> +.BR accept (2) >> +yet. This attribute is reported for listening sockets only. >> +.RE >> +.IP >> +.B UDIAG_SHOW_RQLEN >> +.RS 4 >> +The attribute reported in answer to this request is >> +.BR UNIX_DIAG_RQLEN . >> +The payload associated with this attribute is represented in the following >> +structure: >> + >> +.in +4n >> +.nf >> +struct unix_diag_rqlen { >> + __u32 udiag_rqueue; >> + __u32 udiag_wqueue; >> +}; >> +.fi >> +.in >> + >> +The fields of this structure are as follows: >> +.PD 0 >> +.RS 2 >> +.IP "" 2 >> +.I udiag_rqueue >> +.RS 6 >> +.IP "listening sockets:" 2 >> +The number of pending connections which equals to >> +.B UNIX_DIAG_ICONS >> +array length. >> +.IP "established sockets:" >> +The amount of data in incoming queue. >> +.RE >> +.IP >> +.I udiag_wqueue >> +.RS 6 >> +.IP "listening sockets:" 2 >> +The backlog length which equals to the value passed as the second argument to >> +.BR listen (2). >> +.IP "established sockets:" >> +The amount of memory available for sending. >> +.RE >> +.RE >> +.PD >> +.RE >> +.IP >> +.B UDIAG_SHOW_MEMINFO >> +.RS 4 >> +The attribute reported in answer to this request is >> +.BR UNIX_DIAG_MEMINFO . >> +The payload associated with this attribute is an array of __u32 values >> +described below in "Socket memory information" subsection. >> +.RE >> +.IP >> +.RE >> +.RS >> +The following attributes are reported back without any specific request: >> +.IP "" 2 >> +.BR UNIX_DIAG_SHUTDOWN . >> +The payload associated with this attribute is __u8 value which represents >> +bits of >> +.BR shutdown (2) >> +state. >> +.RE >> +.TP >> +.I udiag_cookie >> +This is service field, both its cells should be set to \-1. > > What does "service field" mean? I think this needs to be clarified. > >> +.PP >> +The response to a query for UNIX domain sockets is represented as an array of >> + >> +.in +4n >> +.nf >> +struct unix_diag_msg { >> + __u8 udiag_family; >> + __u8 udiag_type; >> + __u8 udiag_state; >> + __u8 pad; >> + __u32 udiag_ino; >> + __u32 udiag_cookie[2]; >> +}; >> +.fi >> +.in >> + >> +followed by netlink attributes. >> +.PP >> +The fields of this structure are as follows: >> +.TP >> +.I udiag_type >> +This is set to one of the following constants: >> +.PD 0 >> +.RS >> +.IP "" 2 >> +.B SOCK_PACKET >> +.IP >> +.B SOCK_STREAM >> +.IP >> +.B SOCK_SEQPACKET >> +.RE >> +.PD >> +.TP >> +.I udiag_state >> +This is set to one of the following constants: >> +.PD 0 >> +.RS >> +.IP "" 2 >> +.B TCP_LISTEN >> +.IP >> +.B TCP_ESTABLISHED >> +.RE >> +.PD >> +.TP >> +.I udiag_ino >> +This is the socket inode number. >> +.PP >> +.I udiag_family >> +.PD 0 >> +.PP >> +.I pad >> +.TP >> +.I udiag_cookie >> +These fields have the same meaning as in >> +.IR "struct unix_diag_req" . >> +.PD >> + >> +.SS IPv4 and IPv6 sockets >> +For IPv4 and IPv6 sockets the request is represented in the following structure: >> + >> +.in +4n >> +.nf >> +struct inet_diag_req_v2 { >> + __u8 sdiag_family; >> + __u8 sdiag_protocol; >> + __u8 idiag_ext; >> + __u8 pad; >> + __u32 idiag_states; >> + struct inet_diag_sockid id; >> +}; >> +.fi >> +.in >> + >> +where >> +.I "struct inet_diag_sockid" >> +is defined as follows: >> + >> +.in +4n >> +.nf >> +struct inet_diag_sockid { >> + __be16 idiag_sport; >> + __be16 idiag_dport; >> + __be32 idiag_src[4]; >> + __be32 idiag_dst[4]; >> + __u32 idiag_if; >> + __u32 idiag_cookie[2]; >> +}; >> +.fi >> +.in >> +.PP >> +The fields of >> +.I "struct inet_diag_req_v2" >> +are as follows: >> +.TP >> +.I sdiag_family >> +This should be set to either >> +.B PF_INET >> +or >> +.B PF_INET6 >> +for >> +.B IPv4 >> +or >> +.B IPv6 >> +sockets respectively. >> +.TP >> +.I sdiag_protocol >> +This should be set to one of the following constants: >> +.PD 0 >> +.RS >> +.IP "" 2 >> +.B IPPROTO_TCP >> +.IP >> +.B IPPROTO_UDP >> +.IP >> +.B IPPROTO_UDPLITE >> +.RE >> +.PD >> +.TP >> +.I idiag_ext >> +This is a set of flags defining which extended information to report. >> +Each requested info is reported back as a netlink attribute as described >> +below: >> +.RS >> +.IP "" 2 > > Replace the last line with > > .TP > >> +.B INET_DIAG_TOS >> +The payload associated with this attribute is a __u8 value >> +which is the TOS of the socket. >> +.IP > > .TP > >> +.B INET_DIAG_TCLASS >> +The payload associated with this attribute is a __u8 value >> +which is the TClass of the socket. IPv6 sockets only. >> +For LISTEN and CLOSE sockets this is followed by >> +.B INET_DIAG_SKV6ONLY >> +attribute with associated __u8 payload value meaning whether the socket >> +is IPv6-only or not. >> +.IP > > .TP > >> +.B INET_DIAG_MEMINFO >> +The payload associated with this attribute is represented in the following >> +structure: >> + >> +.in +4n >> +.nf >> +struct inet_diag_meminfo { >> + __u32 idiag_rmem; >> + __u32 idiag_wmem; >> + __u32 idiag_fmem; >> + __u32 idiag_tmem; >> +}; >> +.fi >> +.in >> + >> +The fields of this structure are as follows: >> +.PD 0 > > Delete previous line. > >> +.RS 2 > > Make the last line > > .RS > >> +.IP "" 2 > > Change the last line to > > .TP 12 > >> +.I idiag_rmem >> +The amount of data in the receive queue. >> +.IP > > .TP > >> +.I idiag_wmem >> +The amount of data that is queued by TCP but not yet sent. >> +.IP > > .TP > >> +.I idiag_fmem >> +The amount of memory scheduled for future use (TCP only). >> +.IP > > .TP > >> +.I idiag_tmem >> +The amount of data in send queue. >> +.RE >> +.PD > > Remove previous line > >> +.IP > > .TP > >> +.B INET_DIAG_SKMEMINFO >> +The payload associated with this attribute is an array of __u32 values >> +described below in "Socket memory information" subsection. >> +.IP > > .TP > >> +.B INET_DIAG_INFO >> +The payload associated with this attribute is protocol specific. >> +For TCP sockets it is an object of type >> +.IR "struct tcp_info" . >> +.IP > > .TP > >> +.B INET_DIAG_CONG >> +The payload associated with this attribute is a string that describes the >> +congestion control algorithm used. For TCP sockets only. >> +.RE >> +.TP >> +.I pad >> +This should be set to 0. >> +.TP >> +.I idiag_states >> +This is a bit mask that defines a filter of sockets states. >> +Only those sockets whose states are in this mask will be reported. >> +Ignored when querying for an individual socket. >> +.TP >> +.I id >> +This is a socket id object that is used in dump requests, in queries >> +about individual sockets, and is reported back in each response. >> +Unlike UNIX domain sockets, IPv4 and IPv6 sockets are identified >> +using addresses and ports. All values are in network byte order. >> +.PP >> +The fields of >> +.I "struct inet_diag_sockid" >> +are as follows: >> +.TP >> +.I idiag_sport >> +The source port. >> +.TP >> +.I idiag_dport >> +The destination port. >> +.TP >> +.I idiag_src >> +The source address. >> +.TP >> +.I idiag_dst >> +The destination address. >> +.TP >> +.I idiag_if >> +The interface number the socket is bound to. >> +.TP >> +.I idiag_cookie >> +This is a service field, both its cells should be set to \-1. > > Again, I think the meaning of "service" filed needs to be clarified. > >> +.PP >> +The response to a query for IPv4 or IPv6 sockets is represented as an array of >> + >> +.in +4n >> +.nf >> +struct inet_diag_msg { >> + __u8 idiag_family; >> + __u8 idiag_state; >> + __u8 idiag_timer; >> + __u8 idiag_retrans; >> + >> + struct inet_diag_sockid id; >> + >> + __u32 idiag_expires; >> + __u32 idiag_rqueue; >> + __u32 idiag_wqueue; >> + __u32 idiag_uid; >> + __u32 idiag_inode; >> +}; >> +.fi >> +.in >> + >> +followed by netlink attributes. >> +.PP >> +The fields of this structure are as follows: >> +.TP >> +.I idiag_family >> +This is the same field as in >> +.IR "struct inet_diag_req_v2" . >> +.TP >> +.I idiag_state >> +This denotes socket state as in >> +.IR "struct inet_diag_req_v2" . >> +.PP >> +.I idiag_timer >> +.PD 0 >> +.PP >> +.I idiag_retrans >> +.TP >> +.I idiag_expires >> +These fields are TCP-only and represent the timeout that is currently >> +in action for particular TCP state (0 for established sockets). > > Can you elaborate with an example of a state that has a timeout. > (Is this just TIME_WAIT, or others also?) > >> +.PD >> +.TP >> +.I idiag_rqueue >> +.RS 7 >> +.IP "listening sockets:" 2 >> +The number of pending connections. >> +.IP "other sockets:" >> +The amount of data in incoming queue. >> +.RE >> +.TP >> +.I idiag_wqueue >> +.RS 7 >> +.IP "listening sockets:" 2 >> +The backlog length. >> +.IP "other sockets:" >> +The amount of memory available for sending. >> +.RE >> +.TP >> +.I idiag_uid >> +This is the socket owner UID. >> +.TP >> +.I idiag_inode >> +This is the socket inode number. >> + >> +.SS Socket memory information >> +The payload associated with >> +.B UNIX_DIAG_MEMINFO >> +and >> +.BR INET_DIAG_SKMEMINFO >> +netlink attributes is an array of the following __u32 values: >> +.TP >> +.B SK_MEMINFO_RMEM_ALLOC >> +The amount of data in receive queue. >> +.TP >> +.B SK_MEMINFO_RCVBUF >> +The receive socket buffer as set by >> +.BR SO_RCVBUF . >> +.TP >> +.B SK_MEMINFO_WMEM_ALLOC >> +The amount of data in send queue. >> +.TP >> +.B SK_MEMINFO_SNDBUF >> +The send socket buffer as set by >> +.BR SO_SNDBUF . >> +.TP >> +.B SK_MEMINFO_FWD_ALLOC >> +The amount of memory scheduled for future use (TCP only). >> +.TP >> +.B SK_MEMINFO_WMEM_QUEUED >> +The amount of data queued by TCP, but not yet sent. >> +.TP >> +.B SK_MEMINFO_OPTMEM >> +The amount of memory allocated for socket's service needs (e.g. socket >> +filter). >> +.TP >> +.B SK_MEMINFO_BACKLOG >> +The amount of packets in the backlog (not yet processed). >> +.SH CONFORMING TO >> +The NETLINK_SOCK_DIAG API is Linux-specific. >> +.SH VERSIONS >> +.B NETLINK_SOCK_DIAG >> +was introduced in Linux 3.3. >> +.PP >> +.B UNIX_DIAG_MEMINFO >> +and >> +.BR INET_DIAG_SKMEMINFO >> +were introduced in Linux 3.6. >> +.SH SEE ALSO >> +.BR netlink (3), >> +.BR rtnetlink (3), >> +.BR netlink (7) > > In the next iteration, I think it would be simplest to just include > the example program in the same patch. > > Thanks, > > Michael > > -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/ -- To unsubscribe from this list: send the line "unsubscribe linux-man" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html