Re: [PATCH v4] ceph: invalidate pages when doing direct/sync writes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 4/7/22 11:15 PM, Luís Henriques wrote:
When doing a direct/sync write, we need to invalidate the page cache in
the range being written to.  If we don't do this, the cache will include
invalid data as we just did a write that avoided the page cache.

Signed-off-by: Luís Henriques <lhenriques@xxxxxxx>
---
  fs/ceph/file.c | 19 ++++++++++++++-----
  1 file changed, 14 insertions(+), 5 deletions(-)

Changes since v3:
- Dropped initial call to invalidate_inode_pages2_range()
- Added extra comment to document invalidation

Changes since v2:
- Invalidation needs to be done after a write

Changes since v1:
- Replaced truncate_inode_pages_range() by invalidate_inode_pages2_range
- Call fscache_invalidate with FSCACHE_INVAL_DIO_WRITE if we're doing DIO

diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index 5072570c2203..97f764b2fbdd 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -1606,11 +1606,6 @@ ceph_sync_write(struct kiocb *iocb, struct iov_iter *from, loff_t pos,
  		return ret;
ceph_fscache_invalidate(inode, false);
-	ret = invalidate_inode_pages2_range(inode->i_mapping,
-					    pos >> PAGE_SHIFT,
-					    (pos + count - 1) >> PAGE_SHIFT);
-	if (ret < 0)
-		dout("invalidate_inode_pages2_range returned %d\n", ret);
while ((len = iov_iter_count(from)) > 0) {
  		size_t left;
@@ -1938,6 +1933,20 @@ ceph_sync_write(struct kiocb *iocb, struct iov_iter *from, loff_t pos,
  			break;
  		}
  		ceph_clear_error_write(ci);
+
+		/*
+		 * we need to invalidate the page cache here, otherwise the
+		 * cache will include invalid data in direct/sync writes.
+		 */
+		ret = invalidate_inode_pages2_range(
+				inode->i_mapping,
+				pos >> PAGE_SHIFT,
+				(pos + len - 1) >> PAGE_SHIFT);
+		if (ret < 0) {
+			dout("invalidate_inode_pages2_range returned %d\n",
+			     ret);
+			ret = 0;

For this, IMO it's not safe. If we just ignore it the pagecache will still have invalid data.

I think what the 'ceph_direct_read_write()' does is more correct, it will make sure all the dirty pages are writeback from the pagecaches by using 'invalidate_inode_pages2_range()' without blocking and later will do the invalidate blocked by using 'truncate_inode_pages_range()' if some pages are not unmaped in 'invalidate_inode_pages2_range()' when EBUSY.

This can always be sure that the pagecache has no invalid data after write finishes. I think why it use the truncate helper here is because it's safe and there shouldn't have any buffer write happen for DIO ?

But from my understanding the 'ceph_direct_read_write()' is still buggy. What if the page fault happen just after 'truncate_inode_pages_range()' ? Will this happen ? Should we leave this to use the file lock to guarantee it in user space ?

Thought ?

-- Xiubo

+		}
  		pos += len;
  		written += len;
  		dout("sync_write written %d\n", written);





[Index of Archives]     [CEPH Users]     [Ceph Large]     [Ceph Dev]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux