Re: [PATCH] open.2: improve description of O_DIRECT

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Greg,

Sorry for the delay in following up.

In general the patch looks great -- thanks!  I have one small question,
noted below.

Greg Banks wrote:
> Against man-pages-2.76.  Update the description of the O_DIRECT flag to open(2)
> 
>  - to document the behaviour of O_DIRECT with NFS, and
> 
>  - to be clearer about the O_DIRECT alignment restriction
>    mess in Linux, and
> 
>  - to recommend that application writers exercise caution.
> 
> Information from reading NFS & XFS source and talking to XFS folks.
> 
> Signed-off-by: Greg Banks <gnb@xxxxxxxxxxxxxxxxx>
> Reviewed-by: David Chinner <dgc@xxxxxxx>
> Reviewed-by: Jeremy Higdon <jeremy@xxxxxxx>
> References: SGI:PV975946
> ---
> 
>  man2/open.2 |   81 ++++++++++++++++++++++++++++++++++++++-----------
>  1 file changed, 63 insertions(+), 18 deletions(-)
> 
> Index: man-pages-2.76/man2/open.2
> ===================================================================
> --- man-pages-2.76.orig/man2/open.2	2008-01-18 13:04:11.523554019 +1100
> +++ man-pages-2.76/man2/open.2	2008-01-21 20:31:52.206769981 +1100
> @@ -197,14 +197,9 @@ The I/O is synchronous, that is, at the 
>  .BR read (2)
>  or
>  .BR write (2),
> -data is guaranteed to have been transferred.
> -Under Linux 2.4 transfer sizes, and the alignment of user buffer
> -and file offset must all be multiples of the logical block size
> -of the file system.
> -Under Linux 2.6 alignment to 512-byte boundaries
> -suffices.
> -.\" Alignment should satisfy requirements for the underlying device
> -.\" There may be coherency problems.
> +data is guaranteed to have been transferred.  See
> +.B NOTES
> +below for further discussion.
>  .sp
>  A semantically similar (but deprecated) interface for block devices
>  is described in
> @@ -587,17 +582,67 @@ On many systems the file is actually tru
>  .LP
>  The
>  .B O_DIRECT
> -flag was introduced in SGI IRIX, where it has alignment restrictions
> -similar to those of Linux 2.4.
> -IRIX has also a fcntl(2) call to
> -query appropriate alignments, and sizes.
> -FreeBSD 4.x introduced
> -a flag of same name, but without alignment restrictions.
> -Support was added under Linux in kernel version 2.4.10.
> +flag may impose alignment restrictions on the length and address
> +of userspace buffers and the file offset of I/Os.  In Linux alignment
> +restrictions vary by filesystem and kernel version and might be
> +absent entirely.  However there is currently no filesystem\-independent
> +interface for an application to discover these restrictions for a given
> +file or filesystem.  Some filesystems provide their own interfaces
> +for doing so, for example the
> +.B XFS_IOC_DIOINFO
> +operation in
> +.BR xfsctl (3).
> +.LP
> +Under Linux 2.4, transfer sizes, and the alignment of user buffer
> +and file offset must all be multiples of the logical block size
> +of the file system.  Under Linux 2.6, alignment to 512-byte boundaries
> +suffices.  The flag was introduced in SGI IRIX, where it has alignment
> +restrictions similar to those of Linux 2.4.  IRIX has also a fcntl(2)
> +call to query appropriate alignments, and sizes.  FreeBSD 4.x introduced
> +a flag of the same name, but without alignment restrictions.
> +.LP
> +.B O_DIRECT
> +support was added under Linux in kernel version 2.4.10.
>  Older Linux kernels simply ignore this flag.
> -One may have to define the
> -.B _GNU_SOURCE
> -macro to get its definition.

I take it that you removed that last sentence because the information is
repeated elsewhere on the page?

> +Some filesystems may not implement the flag and
> +.B open
> +will fail with EINVAL if it is used.
> +.LP
> +Applications should avoid mixing
> +.B O_DIRECT
> +and normal I/O to the same
> +file, and especially to overlapping byte regions in the same file.
> +Even when the filesystem correctly handles the coherency issues in
> +this situation, overall I/O throughput is likely to be slower than
> +using either mode alone.  Likewise, applications should avoid mixing
> +.BR mmap (2)
> +of files with direct I/O to the same files.
> +.LP
> +The behaviour of
> +.B O_DIRECT
> +with NFS will differ from local filesystems.  Older kernels, or
> +kernels configured in certain ways, may not support this combination.
> +The NFS protocol does not support passing the flag to the server, so
> +.B O_DIRECT
> +I/O will only bypass the page cache on the client; the server may
> +still cache the I/O.  The client asks the server to make the I/O
> +synchronous to preserve the synchronous semantics of
> +.BR O_DIRECT .
> +Some servers will perform poorly under these circumstances, especially
> +if the I/O size is small.  Some servers may also be configured to
> +lie to clients about the I/O having reached stable storage; this
> +will avoid the performance penalty at some risk to data integrity
> +in the event of server power failure.  The Linux NFS client places
> +no alignment restrictions on
> +.B O_DIRECT
> +I/O.
> +.PP
> +In summary,
> +.B O_DIRECT
> +is a potentially powerful tool that should be used with caution.  It
> +is recommended that applications treat use of
> +.B O_DIRECT
> +as a performance option which is disabled by default.
>  .PP
>  There are many infelicities in the protocol underlying NFS, affecting
>  amongst others

I applied your patch, did some very light (formatting) edits to your
changes, and reorganized the NOTES section a little afterwards, so that the
O_DIRECT material stands in a subsection of its own.  Also, your new
material gives much better context to Linus's quote, so I relocated that
quote from BUGS into NOTES.

I also added your name to the list of copyright holders for the page, since
 you have added a substantial piece to the page.

The changes will be in man-pages-2.78.

Cheers,

Michael

-- 
Michael Kerrisk
Maintainer of the Linux man-pages project
http://www.kernel.org/doc/man-pages/
Want to report a man-pages bug?  Look here:
http://www.kernel.org/doc/man-pages/reporting_bugs.html


-
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Kernel Documentation]     [Netdev]     [Linux Ethernet Bridging]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux