Hi Trond et al, we seem to have a regression introduced by commit 7b281ee026552f10862b617a2a51acf49c829554 NFS: fsync() must exit with an error if page writeback failed which has found it's way (in different form into -stable releases). The problem is that an NFSERR_NOSPC comes through as EIO. e.g. if /mnt2/ if an nfs mounted filesystem that has no space then strace dd if=/dev/zero conv=fsync >> /mnt2/afile count=1 reported Input/output error and the relevant parts of the strace output are: write(1, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512) = 512 fsync(1) = -1 EIO (Input/output error) close(1) = -1 ENOSPC (No space left on device) i.e. we get an EIO from fsync, then the ENOSPC comes with the close. If don't do the fsync, then: strace dd if=/dev/zero >> /mnt2/afile count=1 write(1, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512) = 512 close(1) = -1 EIO (Input/output error) we don't see the ENOSPC at all. The problem is that filemap_fdatawait_range, when it sees a page with PageError set, will return -EIO unless AS_ENOSPC is set. NFS never sets AS_ENOSPC so we get -EIO. Then nfs_file_fsync() will return as soon as it sees an error from filemap_write_and_wait_range(), which will be that EIO. nfs_file_fsync_commit knows to prefer the error from ctx->error over any other error, but nfs_file_fsync() doesn't. I see two ways to "fix" this. We could get nfs{,4}_file_fsync() to always call nfs_file_fsync_commit() and use the error from the later in preference to the error from filemap_write_and_wait_range(). That results in this, more correct, behaviour: write(1, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512) = 512 fsync(1) = -1 ENOSPC (No space left on device) close(1) = 0 Or we could get nfs_context_set_write_error() to call mapping_set_error(), which would result in AS_ENOSPC being set and so filemap_write_and_wait_range() will return the "correct" error. This results in this behaviour: write(1, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512) = 512 fsync(1) = -1 ENOSPC (No space left on device) close(1) = -1 ENOSPC (No space left on device) which is a bit odd. The first ENOSPC is from AS_ENOSPC. The second is from ctx->error. I feel that calling mapping_set_error() is the "right" thing to do, but it would need a bit more work to avoid the double errors. What approach would you prefer? The two patches I used are: diff --git a/fs/nfs/file.c b/fs/nfs/file.c index 582bb88..19c06b9 100644 --- a/fs/nfs/file.c +++ b/fs/nfs/file.c @@ -295,12 +295,12 @@ nfs_file_fsync(struct file *file, loff_t start, loff_t end, int datasync) struct inode *inode = file->f_path.dentry->d_inode; do { - ret = filemap_write_and_wait_range(inode->i_mapping, start, end); - if (ret != 0) - break; + int ret1 = filemap_write_and_wait_range(inode->i_mapping, start, end); mutex_lock(&inode->i_mutex); ret = nfs_file_fsync_commit(file, start, end, datasync); mutex_unlock(&inode->i_mutex); + if (ret == 0) + ret = ret1; /* * If nfs_file_fsync_commit detected a server reboot, then * resend all dirty pages that might have been covered by diff --git a/fs/nfs/nfs4file.c b/fs/nfs/nfs4file.c index afddd66..f0d9a88 100644 --- a/fs/nfs/nfs4file.c +++ b/fs/nfs/nfs4file.c @@ -96,15 +96,15 @@ nfs4_file_fsync(struct file *file, loff_t start, loff_t end, int datasync) struct inode *inode = file->f_path.dentry->d_inode; do { - ret = filemap_write_and_wait_range(inode->i_mapping, start, end); - if (ret != 0) - break; + int ret1 = filemap_write_and_wait_range(inode->i_mapping, start, end); mutex_lock(&inode->i_mutex); ret = nfs_file_fsync_commit(file, start, end, datasync); if (!ret && !datasync) /* application has asked for meta-data sync */ ret = pnfs_layoutcommit_inode(inode, true); mutex_unlock(&inode->i_mutex); + if (ret == 0) + ret = ret1; /* * If nfs_file_fsync_commit detected a server reboot, then * resend all dirty pages that might have been covered by and diff --git a/fs/nfs/write.c b/fs/nfs/write.c index 9347ab7..b0f5bb7 100644 --- a/fs/nfs/write.c +++ b/fs/nfs/write.c @@ -140,6 +140,7 @@ static void nfs_context_set_write_error(struct nfs_open_context *ctx, int error) ctx->error = error; smp_wmb(); set_bit(NFS_CONTEXT_ERROR_WRITE, &ctx->flags); + mapping_set_error(ctx->dentry->d_inode->i_mapping, error); } static struct nfs_page * Thanks, NeilBrown
Attachment:
signature.asc
Description: PGP signature