Re: [PATCH 06/13] xfs: xfs_sync_data is redundant.

Dave Chinner <david@xxxxxxxxxxxxx> · Wed, 3 Oct 2012 06:51:24 +1000

On Tue, Oct 02, 2012 at 09:01:33AM -0400, Brian Foster wrote:
> On 10/01/2012 08:44 PM, Brian Foster wrote:
> > On 10/01/2012 08:10 PM, Dave Chinner wrote:
> ...
> > 
> > I gave this a quick couple runs against 273 and it passes (on top of
> > the entire die-xfssyncd-die patchset). I'll kick off another full run
> > on this box overnight. Thanks!
> > 
> 
> And I spoke a bit too soon... I hit the following warning with this change:
> 
> WARNING: at fs/fs-writeback.c:1401 sync_inodes_sb+0xc0/0xd0()
> 
> The inline patch addresses it. I also see the following message during
> 273 but it doesn't appear related to this set:
> 
> kernel: XFS (dm-4): xlog_verify_grant_tail: space > BBTOB(tail_blocks)
> 
> Brian
> 
> diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
> index da69c18..f11133b 100644
> --- a/fs/xfs/xfs_inode.h
> +++ b/fs/xfs/xfs_inode.h
> @@ -294,7 +294,9 @@ xfs_new_eof(struct xfs_inode *ip, xfs_fsize_t new_size)
>  static inline void
>  xfs_flush_inodes(struct xfs_inode *ip)
>  {
> -	writeback_inodes_sb_if_idle(VFS_I(ip)->i_sb, WB_REASON_FS_FREE_SPACE);
> +	down_read(&VFS_I(ip)->i_sb->s_umount);
> +	sync_inodes_sb(VFS_I(ip)->i_sb);
> +	up_read(&VFS_I(ip)->i_sb->s_umount);
>  }

I don't think we can do an unconditional down_read() there, as the
caller from xfs_create() already holds an i_mutex (the VFS holds the
directory inode lock) and I'm pretty sure that s_umount is supposed
to be outside per-inode locks.

Given that where we are called we are inside a transaction for the
create case, and inside mnt_want_write() protection for the buffered
write case, the likelyhood of s_umount being held for write at
ENOSPC is going to be non-existent at these call sites. Hence a
down_read_trylock() will avoid lock ordering issues, but will almost
always succeed and so be equivalent to down_read()....

/me modifies and runs 273 and the enospc xfstests group...

Seems to work just fine, and no warnings. Patch below.

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx

xfs: make inode writeback at ENOSPC blocking.

From: Dave Chinner <dchinner@xxxxxxxxxx>

writeback_inodes_sb_if_idle() is not sufficient to trigger delalloc
conversion fast enough to prevent spurious ENOSPC whent here are
hundreds of writers, thousands of small files and GBs of free RAM.
Change this to use sync_sb_inodes() to block callers while we wait
for writeback like the previous xfs_flush_inodes implementation did.

We have to hold the s_umount lock here, but because this call can
nest inside i_mutex (the parent directory in the create case, held
by the VFS), we have to use down_read_trylock() to avoid potential
deadlocks. In practice, this trylock will succeed on almost every
attempt as unmount/remount type operations are exceedingly rare.

Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
---
 fs/xfs/xfs_inode.h |    7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
index da69c18..b3dabe9 100644
--- a/fs/xfs/xfs_inode.h
+++ b/fs/xfs/xfs_inode.h
@@ -294,7 +294,12 @@ xfs_new_eof(struct xfs_inode *ip, xfs_fsize_t new_size)
 static inline void
 xfs_flush_inodes(struct xfs_inode *ip)
 {
-	writeback_inodes_sb_if_idle(VFS_I(ip)->i_sb, WB_REASON_FS_FREE_SPACE);
+	struct super_block *sb = VFS_I(ip)->i_sb;
+
+	if (down_read_trylock(&sb->s_umount)) {
+		sync_inodes_sb(sb);
+		up_read(&sb->s_umount);
+	}
 }
 
 /*

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs