Another quiet weekend trying to debug this, and only minor progress. The biggest different in traces of the old vs new code is that we manage to allocate much bigger delalloc reservations at a time in xfs_bmapi_delay -> xfs_bmapi_reserve_delalloc. The old code always went for a single FSB, which also meant allocating an indlen of 7 FSBs. With the iomap code we always allocate at least 4FSB (aka a page), and sometimes 8 or 12. All of these still need 7 FSBs for the worst case indirect blocks. So what happens here is that in an ENOSPC case we manage to allocate more actual delalloc blocks before hitting ENOSPC - notwithstanding that the old case would immediately release them a little later in xfs_bmap_add_extent_hole_delay after merging the delalloc extents. On the writeback side I don't see to many changes either. We'll eventually run out of blocks when allocating the transaction in xfs_iomap_write_allocate because the reserved pool is too small. The only real difference to before is that under the ENOSPC / out of memory case we have allocated between 4 to 12 times more blocks, so we have to clean up 4 to 12 times as much while write_cache_pages continues iterating over these dirty delalloc blocks. For me this happens ~6 times as much as before, but I still don't manage to hit an endless loop. Now after spending this much time I've started wondering why we even reserve blocks in xfs_iomap_write_allocate - after all we've reserved space for the actual data blocks and the indlen worst case in xfs_bmapi_reserve_delalloc. And in fact a little hack to drop that reservation seems to solve both the root cause (depleted reserved pool) and the cleanup mess. I just haven't spend enought time to convince myself that it's actually safe, and in fact looking at the allocator makes me thing it only works by accident currently despite generally postive test results. Here is the quick patch if anyone wants to chime in: diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c index 620fc91..67c317f 100644 --- a/fs/xfs/xfs_iomap.c +++ b/fs/xfs/xfs_iomap.c @@ -717,7 +717,7 @@ xfs_iomap_write_allocate( nimaps = 0; while (nimaps == 0) { - nres = XFS_EXTENTADD_SPACE_RES(mp, XFS_DATA_FORK); + nres = 0; // XFS_EXTENTADD_SPACE_RES(mp, XFS_DATA_FORK); error = xfs_trans_alloc(mp, &M_RES(mp)->tr_write, nres, 0, XFS_TRANS_RESERVE, &tp); -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html