Re: [PATCH 1/4] sock_diag.7: New page documenting NETLINK_SOCK_DIAG interface

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello Dmitry

Thanks for taking the time to work on this!

I will probably have more comments, for a future draft, but here are
a few initial comments. Could you take a look and send a new draft?

On 03/16/2016 06:25 AM, Dmitry V. Levin wrote:
> From: Pavel Emelyanov <xemul@xxxxxxxxxxxxx>
> 
> Cowritten-by: Dmitry V. Levin <ldv@xxxxxxxxxxxx>
> Signed-off-by: Pavel Emelyanov <xemul@xxxxxxxxxxxxx>
> Signed-off-by: Dmitry V. Levin <ldv@xxxxxxxxxxxx>
> ---
>  man7/sock_diag.7 | 632 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 632 insertions(+)
>  create mode 100644 man7/sock_diag.7
> 
> diff --git a/man7/sock_diag.7 b/man7/sock_diag.7
> new file mode 100644
> index 0000000..d1be9cf
> --- /dev/null
> +++ b/man7/sock_diag.7
> @@ -0,0 +1,632 @@
> +.\" Copyright (c) 2016 Pavel Emelyanov <xemul@xxxxxxxxxxxxx>
> +.\" Copyright (c) 2016 Dmitry V. Levin <ldv@xxxxxxxxxxxx>
> +.\"
> +.\" %%%LICENSE_START(GPLv2+_DOC_FULL)
> +.\" This is free documentation; you can redistribute it and/or
> +.\" modify it under the terms of the GNU General Public License as
> +.\" published by the Free Software Foundation; either version 2 of
> +.\" the License, or (at your option) any later version.
> +.\"
> +.\" The GNU General Public License's references to "object code"
> +.\" and "executables" are to be interpreted as the output of any
> +.\" document formatting or typesetting system, including
> +.\" intermediate and printed output.
> +.\"
> +.\" This manual is distributed in the hope that it will be useful,
> +.\" but WITHOUT ANY WARRANTY; without even the implied warranty of
> +.\" MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +.\" GNU General Public License for more details.
> +.\"
> +.\" You should have received a copy of the GNU General Public
> +.\" License along with this manual; if not, see
> +.\" <http://www.gnu.org/licenses/>.
> +.\" %%%LICENSE_END
> +.TH SOCK_DIAG 7 2016-03-14 "Linux" "Linux Programmer's Manual"
> +.SH NAME
> +sock_diag \- querying information about sockets
> +.SH SYNOPSIS
> +.nf
> +.B #include <sys/socket.h>
> +.B #include <linux/sock_diag.h>
> +.BR "#include <linux/unix_diag.h>" " /* for UNIX domain sockets */"
> +.BR "#include <linux/inet_diag.h>" " /* for IPv4 and IPv6 sockets */"
> +
> +.BI "diag_socket = socket(AF_NETLINK, " socket_type ", NETLINK_SOCK_DIAG);"
> +.fi
> +.SH DESCRIPTION
> +The sock_diag netlink subsystem provides a mechanism for querying information
> +about sockets of various protocol families from the kernel.  This subsystem
> +can be used to query information about individual sockets or request a list of
> +those.

Sorry, it's not clear what "those" refers to. Could you replace with a noun?

> +
> +In the request the caller can specify the additional information it would
> +like to query about the socket, e.g. memory info or family-specific stuff.

s/query/obtain/

> +
> +When requesting a list of sockets, the caller can specify filters
> +that would be applied by the kernel to select a subset of sockets to report.
> +For now there's only the ability to filter sockets by state (connected,
> +listening, etc.)
> +
> +Note that sock_diag reports only those sockets that have a name,
> +i.e. either bound explicitly with
> +.BR bind (2)
> +or auto-bound ones (e.g. connected).  This is the same set of sockets that
> +is available via
> +.IR /proc/net/unix ,
> +.IR /proc/net/tcp ,
> +.IR /proc/net/udp ,
> +etc.
> +
> +.SS Request
> +The request starts with
> +.I "struct nlmsghdr"
> +header that has
> +.I nlmsg_type
> +field set to
> +.BR SOCK_DIAG_BY_FAMILY .

I think it might be useful to either show the nlmsghdr structure here, or 
add a note to tell the reader that this structure defintion is hown in 
netlink(7).

> +It is followed by a protocol family specific header that starts with a common
> +part shared by all protocol families:
> +
> +.in +4n
> +.nf
> +struct sock_diag_req {
> +    __u8 sdiag_family;
> +    __u8 sdiag_protocol;
> +};
> +.fi
> +.in
> +.PP
> +The fields of this structure are as follows:
> +.TP
> +.I sdiag_family
> +The protocol family of querying sockets.  It should be set to the appropriate

The meaning of "querying sockets" is unclear. Can you reword/elaborate?

> +.B PF_*

AF_* constant, I would say. POSIX doesn't talk about PF*, and in practice
there's always a one to one relationship between AF_*  and PF_*, so all other 
man pages use AF_*.

> +constant.
> +.TP
> +.I sdiag_protocol
> +Depends on
> +.IR sdiag_family .
> +It should be set to the appropriate
> +.B IPPROTO_*
> +constant for
> +.B PF_INET

AF_...

> +and
> +.BR PF_INET6,

AF_...

> +and to 0 otherwise.
> +.PP
> +If
> +.I nlmsg_flags
> +field of the
> +.I "struct nlmsghdr"
> +header has
> +.BR NLM_F_DUMP
> +flag set, then a list of sockets is being requested,
> +otherwise it is a query about an individual socket.
> +
> +.SS Response
> +The response starts with
> +.I "struct nlmsghdr"
> +header and is followed by an array of family-specific objects.
> +The array is to be accessed with the standard
> +.B NLMSG_*
> +macros from
> +.BR netlink (3)
> +API.
> +.PP
> +Each object is the NLA (netlink attributes) list that is to be accessed
> +with the
> +.B RTA_*
> +macros from
> +.BR rtnetlink (3)
> +API.
> +
> +.SS UNIX domain sockets
> +For UNIX domain sockets the request is represented in the following structure:
> +
> +.in +4n
> +.nf
> +struct unix_diag_req {
> +    __u8    sdiag_family;
> +    __u8    sdiag_protocol;
> +    __u16   pad;
> +    __u32   udiag_states;
> +    __u32   udiag_ino;
> +    __u32   udiag_show;
> +    __u32   udiag_cookie[2];
> +};
> +.fi
> +.in
> +.PP
> +The fields of this structure are as follows:
> +.TP
> +.I sdiag_family
> +This is a protocol family, it should be set to
> +.BR PF_UNIX .
> +.PP
> +.I sdiag_protocol
> +.PD 0
> +.TP
> +.PD
> +.I pad
> +These fields should be set to 0.
> +.TP
> +.I udiag_states
> +This is a bit mask that defines a filter of sockets states.
> +Only those sockets whose states are in this mask will be reported.
> +Ignored when querying for an individual socket.
> +Supported values are:
> +.PD 0
> +.RS
> +.IP "" 2
> +1 <<
> +.B TCP_ESTABLISHED
> +.IP
> +1 <<
> +.B TCP_LISTEN
> +.RE
> +.PD
> +.TP
> +.I udiag_ino
> +This is an inode number when querying for an individual socket.
> +Ignored for a bulk dump.

The meaning of "bulk dump" is not clear.

> +.TP
> +.I udiag_show
> +This is a set of flags defining which information to report.
> +Each requested info is reported back as a netlink attribute as described
> +below:
> +.RS
> +.IP "" 2
> +.B UDIAG_SHOW_NAME
> +.RS 4
> +The attribute reported in answer to this request is
> +.BR UNIX_DIAG_NAME .
> +The payload associated with this attribute is the name of the socket
> +to which it was bound (a sequence of bytes up to
> +.B UNIX_PATH_MAX
> +length).
> +.RE
> +.IP "" 2
> +.B UDIAG_SHOW_VFS
> +.RS 4
> +The attribute reported in answer to this request is
> +.BR UNIX_DIAG_VFS .
> +The payload associated with this attribute is represented in the following
> +structure:
> +
> +.in +4n
> +.nf
> +struct unix_diag_vfs {
> +    __u32 udiag_vfs_dev;
> +    __u32 udiag_vfs_ino;
> +};
> +.fi
> +.in
> +
> +The fields of this structure are as follows:
> +.PD 0
> +.RS 2
> +.IP "" 2
> +.I udiag_vfs_dev
> +The device number of the corresponding on-disk socket node.
> +.IP
> +.I udiag_vfs_ino
> +The inode number of the corresponding on-disk socket node.
> +.RE
> +.PD
> +.RE
> +.IP
> +.B UDIAG_SHOW_PEER
> +.RS 4
> +The attribute reported in answer to this request is
> +.BR UNIX_DIAG_PEER .
> +The payload associated with this attribute is a __u32 value
> +which is the peer's inode number.
> +This attribute is reported for connected sockets only.
> +.RE
> +.IP
> +.B UDIAG_SHOW_ICONS
> +.RS 4
> +The attribute reported in answer to this request is
> +.BR UNIX_DIAG_ICONS .
> +The payload associated with this attribute is an array of __u32 values
> +which are inode numbers of sockets that has passed the
> +.BR connect (2)
> +call, but hasn't been processed with
> +.BR accept (2)
> +yet.  This attribute is reported for listening sockets only.
> +.RE
> +.IP
> +.B UDIAG_SHOW_RQLEN
> +.RS 4
> +The attribute reported in answer to this request is
> +.BR UNIX_DIAG_RQLEN .
> +The payload associated with this attribute is represented in the following
> +structure:
> +
> +.in +4n
> +.nf
> +struct unix_diag_rqlen {
> +    __u32 udiag_rqueue;
> +    __u32 udiag_wqueue;
> +};
> +.fi
> +.in
> +
> +The fields of this structure are as follows:
> +.PD 0
> +.RS 2
> +.IP "" 2
> +.I udiag_rqueue
> +.RS 6
> +.IP "listening sockets:" 2
> +The number of pending connections which equals to
> +.B UNIX_DIAG_ICONS
> +array length.
> +.IP "established sockets:"
> +The amount of data in incoming queue.
> +.RE
> +.IP
> +.I udiag_wqueue
> +.RS 6
> +.IP "listening sockets:" 2
> +The backlog length which equals to the value passed as the second argument to
> +.BR listen (2).
> +.IP "established sockets:"
> +The amount of memory available for sending.
> +.RE
> +.RE
> +.PD
> +.RE
> +.IP
> +.B UDIAG_SHOW_MEMINFO
> +.RS 4
> +The attribute reported in answer to this request is
> +.BR UNIX_DIAG_MEMINFO .
> +The payload associated with this attribute is an array of __u32 values
> +described below in "Socket memory information" subsection.
> +.RE
> +.IP
> +.RE
> +.RS
> +The following attributes are reported back without any specific request:
> +.IP "" 2
> +.BR UNIX_DIAG_SHUTDOWN .
> +The payload associated with this attribute is __u8 value which represents
> +bits of
> +.BR shutdown (2)
> +state.
> +.RE
> +.TP
> +.I udiag_cookie
> +This is service field, both its cells should be set to \-1.

What does "service field" mean? I think this needs to be clarified.

> +.PP
> +The response to a query for UNIX domain sockets is represented as an array of
> +
> +.in +4n
> +.nf
> +struct unix_diag_msg {
> +    __u8    udiag_family;
> +    __u8    udiag_type;
> +    __u8    udiag_state;
> +    __u8    pad;
> +    __u32   udiag_ino;
> +    __u32   udiag_cookie[2];
> +};
> +.fi
> +.in
> +
> +followed by netlink attributes.
> +.PP
> +The fields of this structure are as follows:
> +.TP
> +.I udiag_type
> +This is set to one of the following constants:
> +.PD 0
> +.RS
> +.IP "" 2
> +.B SOCK_PACKET
> +.IP
> +.B SOCK_STREAM
> +.IP
> +.B SOCK_SEQPACKET
> +.RE
> +.PD
> +.TP
> +.I udiag_state
> +This is set to one of the following constants:
> +.PD 0
> +.RS
> +.IP "" 2
> +.B TCP_LISTEN
> +.IP
> +.B TCP_ESTABLISHED
> +.RE
> +.PD
> +.TP
> +.I udiag_ino
> +This is the socket inode number.
> +.PP
> +.I udiag_family
> +.PD 0
> +.PP
> +.I pad
> +.TP
> +.I udiag_cookie
> +These fields have the same meaning as in
> +.IR "struct unix_diag_req" .
> +.PD
> +
> +.SS IPv4 and IPv6 sockets
> +For IPv4 and IPv6 sockets the request is represented in the following structure:
> +
> +.in +4n
> +.nf
> +struct inet_diag_req_v2 {
> +    __u8    sdiag_family;
> +    __u8    sdiag_protocol;
> +    __u8    idiag_ext;
> +    __u8    pad;
> +    __u32   idiag_states;
> +    struct inet_diag_sockid id;
> +};
> +.fi
> +.in
> +
> +where
> +.I "struct inet_diag_sockid"
> +is defined as follows:
> +
> +.in +4n
> +.nf
> +struct inet_diag_sockid {
> +    __be16  idiag_sport;
> +    __be16  idiag_dport;
> +    __be32  idiag_src[4];
> +    __be32  idiag_dst[4];
> +    __u32   idiag_if;
> +    __u32   idiag_cookie[2];
> +};
> +.fi
> +.in
> +.PP
> +The fields of
> +.I "struct inet_diag_req_v2"
> +are as follows:
> +.TP
> +.I sdiag_family
> +This should be set to either
> +.B PF_INET
> +or
> +.B PF_INET6
> +for
> +.B IPv4
> +or
> +.B IPv6
> +sockets respectively.
> +.TP
> +.I sdiag_protocol
> +This should be set to one of the following constants:
> +.PD 0
> +.RS
> +.IP "" 2
> +.B IPPROTO_TCP
> +.IP
> +.B IPPROTO_UDP
> +.IP
> +.B IPPROTO_UDPLITE
> +.RE
> +.PD
> +.TP
> +.I idiag_ext
> +This is a set of flags defining which extended information to report.
> +Each requested info is reported back as a netlink attribute as described
> +below:
> +.RS
> +.IP "" 2

Replace the last line with

.TP

> +.B INET_DIAG_TOS
> +The payload associated with this attribute is a __u8 value
> +which is the TOS of the socket.
> +.IP

.TP

> +.B INET_DIAG_TCLASS
> +The payload associated with this attribute is a __u8 value
> +which is the TClass of the socket.  IPv6 sockets only.
> +For LISTEN and CLOSE sockets this is followed by
> +.B INET_DIAG_SKV6ONLY
> +attribute with associated __u8 payload value meaning whether the socket
> +is IPv6-only or not.
> +.IP

.TP

> +.B INET_DIAG_MEMINFO
> +The payload associated with this attribute is represented in the following
> +structure:
> +
> +.in +4n
> +.nf
> +struct inet_diag_meminfo {
> +    __u32 idiag_rmem;
> +    __u32 idiag_wmem;
> +    __u32 idiag_fmem;
> +    __u32 idiag_tmem;
> +};
> +.fi
> +.in
> +
> +The fields of this structure are as follows:
> +.PD 0

Delete previous line.

> +.RS 2

Make the last line

.RS

> +.IP "" 2

Change the last line to

.TP 12

> +.I idiag_rmem
> +The amount of data in the receive queue.
> +.IP

.TP

> +.I idiag_wmem
> +The amount of data that is queued by TCP but not yet sent.
> +.IP

.TP

> +.I idiag_fmem
> +The amount of memory scheduled for future use (TCP only).
> +.IP

.TP

> +.I idiag_tmem
> +The amount of data in send queue.
> +.RE
> +.PD

Remove previous line

> +.IP

.TP

> +.B INET_DIAG_SKMEMINFO
> +The payload associated with this attribute is an array of __u32 values
> +described below in "Socket memory information" subsection.
> +.IP

.TP

> +.B INET_DIAG_INFO
> +The payload associated with this attribute is protocol specific.
> +For TCP sockets it is an object of type
> +.IR "struct tcp_info" .
> +.IP

.TP

> +.B INET_DIAG_CONG
> +The payload associated with this attribute is a string that describes the
> +congestion control algorithm used.  For TCP sockets only.
> +.RE
> +.TP
> +.I pad
> +This should be set to 0.
> +.TP
> +.I idiag_states
> +This is a bit mask that defines a filter of sockets states.
> +Only those sockets whose states are in this mask will be reported.
> +Ignored when querying for an individual socket.
> +.TP
> +.I id
> +This is a socket id object that is used in dump requests, in queries
> +about individual sockets, and is reported back in each response.
> +Unlike UNIX domain sockets, IPv4 and IPv6 sockets are identified
> +using addresses and ports.  All values are in network byte order.
> +.PP
> +The fields of
> +.I "struct inet_diag_sockid"
> +are as follows:
> +.TP
> +.I idiag_sport
> +The source port.
> +.TP
> +.I idiag_dport
> +The destination port.
> +.TP
> +.I idiag_src
> +The source address.
> +.TP
> +.I idiag_dst
> +The destination address.
> +.TP
> +.I idiag_if
> +The interface number the socket is bound to.
> +.TP
> +.I idiag_cookie
> +This is a service field, both its cells should be set to \-1.

Again, I think the meaning of "service" filed needs to be clarified.

> +.PP
> +The response to a query for IPv4 or IPv6 sockets is represented as an array of
> +
> +.in +4n
> +.nf
> +struct inet_diag_msg {
> +    __u8    idiag_family;
> +    __u8    idiag_state;
> +    __u8    idiag_timer;
> +    __u8    idiag_retrans;
> +
> +    struct inet_diag_sockid id;
> +
> +    __u32   idiag_expires;
> +    __u32   idiag_rqueue;
> +    __u32   idiag_wqueue;
> +    __u32   idiag_uid;
> +    __u32   idiag_inode;
> +};
> +.fi
> +.in
> +
> +followed by netlink attributes.
> +.PP
> +The fields of this structure are as follows:
> +.TP
> +.I idiag_family
> +This is the same field as in
> +.IR "struct inet_diag_req_v2" .
> +.TP
> +.I idiag_state
> +This denotes socket state as in
> +.IR "struct inet_diag_req_v2" .
> +.PP
> +.I idiag_timer
> +.PD 0
> +.PP
> +.I idiag_retrans
> +.TP
> +.I idiag_expires
> +These fields are TCP-only and represent the timeout that is currently
> +in action for particular TCP state (0 for established sockets).

Can you elaborate with an example of a state that has a timeout.
(Is this just TIME_WAIT, or others also?)

> +.PD
> +.TP
> +.I idiag_rqueue
> +.RS 7
> +.IP "listening sockets:" 2
> +The number of pending connections.
> +.IP "other sockets:"
> +The amount of data in incoming queue.
> +.RE
> +.TP
> +.I idiag_wqueue
> +.RS 7
> +.IP "listening sockets:" 2
> +The backlog length.
> +.IP "other sockets:"
> +The amount of memory available for sending.
> +.RE
> +.TP
> +.I idiag_uid
> +This is the socket owner UID.
> +.TP
> +.I idiag_inode
> +This is the socket inode number.
> +
> +.SS Socket memory information
> +The payload associated with
> +.B UNIX_DIAG_MEMINFO
> +and
> +.BR INET_DIAG_SKMEMINFO
> +netlink attributes is an array of the following __u32 values:
> +.TP
> +.B SK_MEMINFO_RMEM_ALLOC
> +The amount of data in receive queue.
> +.TP
> +.B SK_MEMINFO_RCVBUF
> +The receive socket buffer as set by
> +.BR SO_RCVBUF .
> +.TP
> +.B SK_MEMINFO_WMEM_ALLOC
> +The amount of data in send queue.
> +.TP
> +.B SK_MEMINFO_SNDBUF
> +The send socket buffer as set by
> +.BR SO_SNDBUF .
> +.TP
> +.B SK_MEMINFO_FWD_ALLOC
> +The amount of memory scheduled for future use (TCP only).
> +.TP
> +.B SK_MEMINFO_WMEM_QUEUED
> +The amount of data queued by TCP, but not yet sent.
> +.TP
> +.B SK_MEMINFO_OPTMEM
> +The amount of memory allocated for socket's service needs (e.g. socket
> +filter).
> +.TP
> +.B SK_MEMINFO_BACKLOG
> +The amount of packets in the backlog (not yet processed).
> +.SH CONFORMING TO
> +The NETLINK_SOCK_DIAG API is Linux-specific.
> +.SH VERSIONS
> +.B NETLINK_SOCK_DIAG
> +was introduced in Linux 3.3.
> +.PP
> +.B UNIX_DIAG_MEMINFO
> +and
> +.BR INET_DIAG_SKMEMINFO
> +were introduced in Linux 3.6.
> +.SH SEE ALSO
> +.BR netlink (3),
> +.BR rtnetlink (3),
> +.BR netlink (7)

In the next iteration, I think it would be simplest to just include
the example program in the same patch.

Thanks,

Michael


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Kernel Documentation]     [Netdev]     [Linux Ethernet Bridging]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux