From: Dave Chinner <dchinner@xxxxxxxxxx> This is fixing a g/476 AGF ABBA deadlock that occurs reliably with this patchset so far. Allocation is attempted in AG 2, which has no space, then AG 3 which fails with "nominleft" search error. THis then returns to the caller with AGF 3 still locked, and the caller then does a retry with more restricted allocation criteria. This then starts again at AG 2, which then deadlocks because it's the wrong AGF locking order. What I can't work out is how the existing TOT code isn't hitting this every time g/476 is run. I can't see where it unlocks AGF 3, nor can I see how it avoids the out of order locking on nospace retries. So this looks like a pre-existing bug, but it takes this allocator rework to expose it? More work investigation needed here. Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx> --- fs/xfs/libxfs/xfs_alloc.c | 21 ++++++++++++++++++++- 1 file changed, 20 insertions(+), 1 deletion(-) diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c index abf78453d155..c0af59e5e935 100644 --- a/fs/xfs/libxfs/xfs_alloc.c +++ b/fs/xfs/libxfs/xfs_alloc.c @@ -3332,8 +3332,27 @@ xfs_alloc_vextent_iterate_ags( args->pag = NULL; return error; } - if (args->agbp) + if (args->agbp) { + /* + * XXX: this looks like a pre-existing bug. alloc_size fails + * because of nominleft, and we return here with the AGF locked + * with args->agbno == NULLAGBLOCK If this happens with any AG + * higher than the start_agno, if the caller then tries to allocate + * again with more restricted parameters, we try locking from + * start_agno again and we deadlock because we've already got a + * higher AGF locked. Hence we need to drop the AGF lock if + * we failed to allocate here. g/476 triggers this reliably. + */ + if (args->agbno == NULLAGBLOCK) { + /* + * XXX: This is not a reliable workaround if the AGF was + * modified! + */ + xfs_trans_brelse(args->tp, args->agbp); + args->agbp = NULL; + } return 0; + } /* * We didn't find an AG we can alloation from. If we were given -- 2.35.1