[PATCH] open.2: improve description of O_DIRECT

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Against man-pages-2.76.  Update the description of the O_DIRECT flag to open(2)

 - to document the behaviour of O_DIRECT with NFS, and

 - to be clearer about the O_DIRECT alignment restriction
   mess in Linux, and

 - to recommend that application writers exercise caution.

Information from reading NFS & XFS source and talking to XFS folks.

Signed-off-by: Greg Banks <gnb@xxxxxxxxxxxxxxxxx>
Reviewed-by: David Chinner <dgc@xxxxxxx>
Reviewed-by: Jeremy Higdon <jeremy@xxxxxxx>
References: SGI:PV975946
---

 man2/open.2 |   81 ++++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 63 insertions(+), 18 deletions(-)

Index: man-pages-2.76/man2/open.2
===================================================================
--- man-pages-2.76.orig/man2/open.2	2008-01-18 13:04:11.523554019 +1100
+++ man-pages-2.76/man2/open.2	2008-01-21 20:31:52.206769981 +1100
@@ -197,14 +197,9 @@ The I/O is synchronous, that is, at the 
 .BR read (2)
 or
 .BR write (2),
-data is guaranteed to have been transferred.
-Under Linux 2.4 transfer sizes, and the alignment of user buffer
-and file offset must all be multiples of the logical block size
-of the file system.
-Under Linux 2.6 alignment to 512-byte boundaries
-suffices.
-.\" Alignment should satisfy requirements for the underlying device
-.\" There may be coherency problems.
+data is guaranteed to have been transferred.  See
+.B NOTES
+below for further discussion.
 .sp
 A semantically similar (but deprecated) interface for block devices
 is described in
@@ -587,17 +582,67 @@ On many systems the file is actually tru
 .LP
 The
 .B O_DIRECT
-flag was introduced in SGI IRIX, where it has alignment restrictions
-similar to those of Linux 2.4.
-IRIX has also a fcntl(2) call to
-query appropriate alignments, and sizes.
-FreeBSD 4.x introduced
-a flag of same name, but without alignment restrictions.
-Support was added under Linux in kernel version 2.4.10.
+flag may impose alignment restrictions on the length and address
+of userspace buffers and the file offset of I/Os.  In Linux alignment
+restrictions vary by filesystem and kernel version and might be
+absent entirely.  However there is currently no filesystem\-independent
+interface for an application to discover these restrictions for a given
+file or filesystem.  Some filesystems provide their own interfaces
+for doing so, for example the
+.B XFS_IOC_DIOINFO
+operation in
+.BR xfsctl (3).
+.LP
+Under Linux 2.4, transfer sizes, and the alignment of user buffer
+and file offset must all be multiples of the logical block size
+of the file system.  Under Linux 2.6, alignment to 512-byte boundaries
+suffices.  The flag was introduced in SGI IRIX, where it has alignment
+restrictions similar to those of Linux 2.4.  IRIX has also a fcntl(2)
+call to query appropriate alignments, and sizes.  FreeBSD 4.x introduced
+a flag of the same name, but without alignment restrictions.
+.LP
+.B O_DIRECT
+support was added under Linux in kernel version 2.4.10.
 Older Linux kernels simply ignore this flag.
-One may have to define the
-.B _GNU_SOURCE
-macro to get its definition.
+Some filesystems may not implement the flag and
+.B open
+will fail with EINVAL if it is used.
+.LP
+Applications should avoid mixing
+.B O_DIRECT
+and normal I/O to the same
+file, and especially to overlapping byte regions in the same file.
+Even when the filesystem correctly handles the coherency issues in
+this situation, overall I/O throughput is likely to be slower than
+using either mode alone.  Likewise, applications should avoid mixing
+.BR mmap (2)
+of files with direct I/O to the same files.
+.LP
+The behaviour of
+.B O_DIRECT
+with NFS will differ from local filesystems.  Older kernels, or
+kernels configured in certain ways, may not support this combination.
+The NFS protocol does not support passing the flag to the server, so
+.B O_DIRECT
+I/O will only bypass the page cache on the client; the server may
+still cache the I/O.  The client asks the server to make the I/O
+synchronous to preserve the synchronous semantics of
+.BR O_DIRECT .
+Some servers will perform poorly under these circumstances, especially
+if the I/O size is small.  Some servers may also be configured to
+lie to clients about the I/O having reached stable storage; this
+will avoid the performance penalty at some risk to data integrity
+in the event of server power failure.  The Linux NFS client places
+no alignment restrictions on
+.B O_DIRECT
+I/O.
+.PP
+In summary,
+.B O_DIRECT
+is a potentially powerful tool that should be used with caution.  It
+is recommended that applications treat use of
+.B O_DIRECT
+as a performance option which is disabled by default.
 .PP
 There are many infelicities in the protocol underlying NFS, affecting
 amongst others

-- 
Greg Banks, R&D Software Engineer, SGI Australian Software Group.
The cake is *not* a lie.
I don't speak for SGI.

-
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Kernel Documentation]     [Netdev]     [Linux Ethernet Bridging]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux