On Tue, Feb 11, 2014 at 01:07:46PM -0500, Brian Foster wrote: > The inode chunk allocation path can lead to deadlock conditions if > a transaction is dirtied with an AGF (to fix up the freelist) for > an AG that cannot satisfy the actual allocation request. This code > path is written to try and avoid this scenario, but it can be > reproduced by running xfstests generic/270 in a loop on a 512b fs. > > An example situation is: > - process A attempts an inode allocation on AG 3, modifies > the freelist, fails the allocation and ultimately moves on to > AG 0 with the AG 3 AGF held > - process B is doing a free space operation (i.e., truncate) and > acquires the AG 0 AGF, waits on the AG 3 AGF > - process A acquires the AG 0 AGI, waits on the AG 0 AGF (deadlock) > > The problem here is that process A acquired the AG 3 AGF while > moving on to AG 0 (and releasing the AG 3 AGI with the AG 3 AGF > held). xfs_dialloc() makes one pass through each of the AGs when > attempting to allocate an inode chunk. The expectation is a clean > transaction if a particular AG cannot satisfy the allocation > request. xfs_ialloc_ag_alloc() is written to support this through > use of the minalignslop allocation args field. > > When using the agi->agi_newino optimization, we attempt an exact > bno allocation request based on the location of the previously > allocated chunk. minalignslop is set to inform the allocator that > we will require alignment on this chunk, and thus to not allow the > request for this AG if the extra space is not available. Suppose > that the AG in question has just enough space for this request, but > not at the requested bno. xfs_alloc_fix_freelist() will proceed as > normal as it determines the request should succeed, and thus it is > allowed to modify the agf. xfs_alloc_ag_vextent() ultimately fails > because the requested bno is not available. In response, the caller > moves on to a NEAR_BNO allocation request for the same AG. The > alignment is set, but the minalignslop field is never reset. This > increases the overall requirement of the request from the first > attempt. If this delta is the difference between allocation success > and failure for the AG, xfs_alloc_fix_freelist() rejects this > request outright the second time around and causes the allocation > request to unnecessarily fail for this AG. > > To address this situation, reset the minalignslop field immediately > after use and prevent it from leaking into subsequent requests. > > Signed-off-by: Brian Foster <bfoster@xxxxxxxxxx> > --- > > v2: > - Reset minalignslop immediately after use rather than prior to the > subsequent request and add a comment. [dchinner] > ping? Any chance to get this committed? Brian > fs/xfs/xfs_ialloc.c | 12 ++++++++++++ > 1 file changed, 12 insertions(+) > > diff --git a/fs/xfs/xfs_ialloc.c b/fs/xfs/xfs_ialloc.c > index 5d7f105..a57843f 100644 > --- a/fs/xfs/xfs_ialloc.c > +++ b/fs/xfs/xfs_ialloc.c > @@ -363,6 +363,18 @@ xfs_ialloc_ag_alloc( > args.minleft = args.mp->m_in_maxlevels - 1; > if ((error = xfs_alloc_vextent(&args))) > return error; > + > + /* > + * This request might have dirtied the transaction if the AG can > + * satisfy the request, but the exact block was not available. > + * If the allocation did fail, subsequent requests will relax > + * the exact agbno requirement and increase the alignment > + * instead. It is critical that the total size of the request > + * (len + alignment + slop) does not increase from this point > + * on, so reset minalignslop to ensure it is not included in > + * subsequent requests. > + */ > + args.minalignslop = 0; > } else > args.fsbno = NULLFSBLOCK; > > -- > 1.8.3.1 > > _______________________________________________ > xfs mailing list > xfs@xxxxxxxxxxx > http://oss.sgi.com/mailman/listinfo/xfs _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs