Re: iomap infrastructure and multipage writes V5

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Another quiet weekend trying to debug this, and only minor progress.

The biggest different in traces of the old vs new code is that we manage
to allocate much bigger delalloc reservations at a time in xfs_bmapi_delay
-> xfs_bmapi_reserve_delalloc.  The old code always went for a single FSB,
which also meant allocating an indlen of 7 FSBs.  With the iomap code
we always allocate at least 4FSB (aka a page), and sometimes 8 or 12.
All of these still need 7 FSBs for the worst case indirect blocks.  So
what happens here is that in an ENOSPC case we manage to allocate more
actual delalloc blocks before hitting ENOSPC - notwithstanding that the
old case would immediately release them a little later in
xfs_bmap_add_extent_hole_delay after merging the delalloc extents.

On the writeback side I don't see to many changes either.  We'll
eventually run out of blocks when allocating the transaction in
xfs_iomap_write_allocate because the reserved pool is too small.  The
only real difference to before is that under the ENOSPC / out of memory
case we have allocated between 4 to 12 times more blocks, so we have
to clean up 4 to 12 times as much while write_cache_pages continues
iterating over these dirty delalloc blocks.   For me this happens
~6 times as much as before, but I still don't manage to hit an
endless loop.

Now after spending this much time I've started wondering why we even
reserve blocks in xfs_iomap_write_allocate - after all we've reserved
space for the actual data blocks and the indlen worst case in
xfs_bmapi_reserve_delalloc.  And in fact a little hack to drop that
reservation seems to solve both the root cause (depleted reserved pool)
and the cleanup mess.  I just haven't spend enought time to convince
myself that it's actually safe, and in fact looking at the allocator
makes me thing it only works by accident currently despite generally
postive test results.

Here is the quick patch if anyone wants to chime in:

diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 620fc91..67c317f 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -717,7 +717,7 @@ xfs_iomap_write_allocate(
 
 		nimaps = 0;
 		while (nimaps == 0) {
-			nres = XFS_EXTENTADD_SPACE_RES(mp, XFS_DATA_FORK);
+			nres = 0; // XFS_EXTENTADD_SPACE_RES(mp, XFS_DATA_FORK);
 
 			error = xfs_trans_alloc(mp, &M_RES(mp)->tr_write, nres,
 					0, XFS_TRANS_RESERVE, &tp);
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux