Re: [PATCH] pNFS: Avoid read-modify-write for page-aligned full page write

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2019/02/08 23:58, Trond Myklebust wrote:
On Fri, 2019-02-08 at 16:54 +0900, 伊藤和夫 wrote:
On 2019/02/07 22:37, Benjamin Coddington wrote:
On 7 Feb 2019, at 3:12, Kazuo Ito wrote:
[snipped]
@@ -299,8 +305,10 @@ static int nfs_want_read_modify_write(struct
file
*file, struct page *page,
      unsigned int end = offset + len;

      if (pnfs_ld_read_whole_page(file->f_mapping->host)) {
-        if (!PageUptodate(page))
-            return 1;
+        if (!PageUptodate(page)) {
+            if (pglen && (end < pglen || offset))
+                return 1;
+        }
          return 0;
      }

This looks right.  I think that a static inline bool
nfs_write_covers_page,
or full_page_write or similar might make sense here, as we do the
same test
just below, and would make the code easier to quickly understand.

Reviewed-by: Benjamin Coddington <bcodding@xxxxxxxxxx>
  >
  > Ben

As per Ben's comment, I made the check for full page write a static
inline function and both the block-oriented and the non-block-
oriented paths call it.

diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index 29553fdba8af..458c77ccf274 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -276,6 +276,12 @@ EXPORT_SYMBOL_GPL(nfs_file_fsync);
    * then a modify/write/read cycle when writing to a page in the
    * page cache.
    *
+ * Some pNFS layout drivers can only read/write at a certain block
+ * granularity like all block devices and therefore we must perform
+ * read/modify/write whenever a page hasn't read yet and the data
+ * to be written there is not aligned to a block boundary and/or
+ * smaller than the block size.
+ *
    * The modify/write/read cycle may occur if a page is read before
    * being completely filled by the writer.  In this situation, the
    * page must be completely written to stable storage on the server
@@ -291,15 +297,23 @@ EXPORT_SYMBOL_GPL(nfs_file_fsync);
    * and that the new data won't completely replace the old data in
    * that range of the file.
    */
-static int nfs_want_read_modify_write(struct file *file, struct page
*page,
-			loff_t pos, unsigned len)
+static bool nfs_full_page_write(struct page *page, loff_t pos,
unsigned
len)
   {
   	unsigned int pglen = nfs_page_length(page);
   	unsigned int offset = pos & (PAGE_SIZE - 1);
   	unsigned int end = offset + len;

+	if (pglen && ((end < pglen) || offset))
+	    return 0;
+	return 1;
+}
+
+static int nfs_want_read_modify_write(struct file *file, struct page
*page,
+			loff_t pos, unsigned len)
+{
   	if (pnfs_ld_read_whole_page(file->f_mapping->host)) {
-		if (!PageUptodate(page))
+		if (!PageUptodate(page) &&
+		    !nfs_full_page_write(page, pos, len))
   			return 1;
   		return 0;
   	}
@@ -307,8 +321,7 @@ static int nfs_want_read_modify_write(struct
file
*file, struct page *page,
   	if ((file->f_mode & FMODE_READ) &&	/* open for read? */
   	    !PageUptodate(page) &&		/* Uptodate? */
   	    !PagePrivate(page) &&		/* i/o request already? */
-	    pglen &&				/* valid bytes of
file? */
-	    (end < pglen || offset))		/* replace all valid
bytes? */
+	    !nfs_full_page_write(page, pos, len))
   		return 1;
   	return 0;
   }

How about adding a separate

if (PageUptodate(page) || nfs_full_page_write())
     return 0;

before the check for pNFS?
>
> That means we won't have to duplicate those for the pNFS block and
> ordinary case, and it improves code clarity.

Yes, it is much better, and

BTW: Why doesn't the pNFS case check for PagePrivate(page)? That looks
like a bug which would cause the existing write to get corrupted.
If so, we should move that check too into the common code.

It's been that way since the check for
pnfs_ld_read_whole_page(file->f_mapping->host) was added there.
As you pointed out, it shouldn't try to initiate read when there's
an outstanding write.
So, I'll update the patch with these changes, including check for
ongoing I/O, and come up with newer test results in a couple of days.
--
kazuo ito (ito_kazuo_g3@xxxxxxxxxxxxxx)
NTT OSS Center



[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux