On Fri, Apr 17, 2015 at 08:28:29AM +1000, Dave Chinner wrote: > On Thu, Apr 16, 2015 at 01:32:38PM -0400, Brian Foster wrote: > > On Thu, Apr 16, 2015 at 03:00:50PM +1000, Dave Chinner wrote: > > > + * Calculate the maximum extent length we can ask to allocate after taking into > > > + * account the on-disk size limitations, the extent size hints and the size > > > + * being requested. We have to deal with the extent size hint here because the > > > + * allocation will attempt alignment and hence grow the length outwards by up to > > > + * @extsz on either side. > > > + */ > > > +static inline xfs_extlen_t > > > +xfs_bmapi_max_extlen( > > > + struct xfs_inode *ip, > > > + xfs_extlen_t length) > > > +{ > > > + xfs_extlen_t extsz = xfs_get_extsz_hint(ip); > > > + xfs_extlen_t max_length = MAXEXTLEN; > > > + > > > + if (extsz) > > > + max_length -= 2 * extsz - 1; > > > > This can underflow or cause other issues if set to just the right value > > (with smaller block sizes such that length can be trimmed to 0): > > But I assumed the existing code was correct for this context. My > bad. :/ > > > $ mkfs.xfs -f -bsize=1k <dev> > > $ mount <dev> /mnt > > $ xfs_io -f -c "extsize 1g" -c "pwrite 0 4k" -c fsync /mnt/file > > pwrite64: No space left on device > > Yup, because it 2^21 = 2G, and extsize = 1g puts max_length < 0. > > Ok. So, the problem is that it is overestimating the amount of space > that alignment will need, and that alignment cannot be guaranteed > for extsz hints of over (MAXEXTLEN / 2) in size. > > i.e. given an alignment (A[0-2]) and an extent (E[01]): > > A0 A1 A2 > +-------------------+-------------------+ > +ooo+ > E0 E1 > > The problem is that the alignment done by xfs_bmap_extsize_align() > only extends outwards (i.e. increases extent size). Hence E0 gets > rounded down to A0-A2, and E1 gets extended to A2, which means we > are adding almost 2 entire extent size hints to the allocation. > That's where the reduction in length by two extsz values came from. > > Now, for delayed allocation, this is just fine, because real > allocation will break this delalloc extent up into two separate > extents, and underflow wouldn't be noticed as delalloc extents are > not physically limited to MAXEXTLEN and so nothing would have > broken. Still, it's not the intended behaviour. > > I'm not sure what the solution is yet - the fundamental problem here > is the outwards alignment of both ends of the extent, and this > MAXEXTLEN twiddling is just an encoding of that behaviour. I need to > spend some time looking at xfs_bmap_extsize_align() and determining > if there is something we can do differently here. Ok, so the callers of xfs_bmap_extsize_align() are: xfs_bmapi_reserve_delalloc() xfs_bmap_btalloc() xfs_bmap_rtalloc(). For xfs_bmapi_reserve_delalloc(), the alignment does not need grow outwards; it can be truncated mid-range, and the code should still work. i.e. A0 A1 A2 +-------------------+-------------------+ +ooo+ E0 E1 +-------------------+ R0 R1 R[01] is a valid alignment and will result in a second allocation occurring for this: A0 A1 A2 +-------------------+-------------------+ +o+ E2 E1 +-------------------+ R1 R2 And so the range we need allocation for (E[01]) will be allocated and correctly extent size aligned. For xfs_bmap_btalloc() - the problem case here - the code is a little more complex. We do: xfs_bmapi_write loop until all allocated { xfs_bmapi_allocate(bma) calc off/len xfs_bmap_btalloc(bma) xfs_bmap_extsize_align(bma) xfs_alloc_vextent update bma->length BMBT insert trim returned map } So we are doing alignment two steps removed from the off/len calculation (xfs_bmap_rtalloc() is in the same boat). Hence the question is whether xfs_bmap_extsize_align() can trim the range being allocated and still have everything work.... Ok, upon further reading, the xfs_bmalloc structure (bma) that is passed between these functions to track the allocation being done is updated after allocation with the length of the extent allocated. IOWs: bma->length = E(len) xfs_bmap_btalloc(bma) A(len) = xfs_bmap_extsize_align(bma->length) R(len) = xfs_alloc_vextent(A(len)) bma->length = R(len) Hence this: A0 A1 A2 +-------------------+-------------------+ +ooo+ E0 E1 +-------------------+ R0 R1 Is a valid result from xfs_bmap_btalloc() and the loop in xfs_bmapi_write() will do a second allocation and alignment as per the above delalloc case. xfs_bmap_rtalloc() appears to mirror this same structure, so should also have the same behaviour. What this means is that we can actually reduce the requested allocation to be only a partial overlap when aligning it, and everything should still work. Let's now see how complex that makes the code... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs