On Wed, May 12, 2021 at 02:49:30PM -0700, Darrick J. Wong wrote: > On Thu, May 06, 2021 at 05:20:53PM +1000, Dave Chinner wrote: > > From: Dave Chinner <dchinner@xxxxxxxxxx> > > > > Because it's a mess. > > > > Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx> ..... > > + /* > > + * Check that there is enough free space for the file plus a chunk of > > + * inodes if we need to allocate some. If this is the first pass across > > + * the AGs, take into account the potential space needed for alignment > > + * of inode chunks when checking the longest contiguous free space in > > + * the AG - this prevents us from getting ENOSPC because we have free > > + * space larger than ialloc_blks but alignment constraints prevent us > > + * from using it. > > + * > > + * If we can't find an AG with space for full alignment slack to be > > + * taken into account, we must be near ENOSPC in all AGs. Hence we > > + * don't include alignment for the second pass and so if we fail > > + * allocation due to alignment issues then it is most likely a real > > + * ENOSPC condition. > > + * > > + * XXX(dgc): this calculation is now bogus thanks to the per-ag > > + * reservations that xfs_alloc_fix_freelist() now does via > > + * xfs_alloc_space_available(). When the AG fills up, pagf_freeblks will > > + * be more than large enough for the check below to succeed, but > > + * xfs_alloc_space_available() will fail because of the non-zero > > + * metadata reservation and hence we won't actually be able to allocate > > + * more inodes in this AG. We do soooo much unnecessary work near ENOSPC > > + * because of this. > > Yuck. Can this be fixed by doing: > > really_free = pag->pagf_freeblks - > xfs_ag_resv_needed(pag, XFS_AG_RESV_NONE); > return really_free >= needspace + ineed && longest >= ineed) > > to account for those reservations, perhaps? Something like that, though I'd much prefer to factor the freelist fixup calculations and use them here so we have a single set of "is there enough space in this AG for allocating X blocks" functions. It's somewhat outside the scope of this patchset, so I wrote the comment rather than trying to address it here and complicate this patchset.... > > @@ -1624,7 +1746,6 @@ xfs_dialloc( > > * Files of these types need at least one block if length > 0 > > * (and they won't fit in the inode, but that's hard to figure out). > > */ > > - needspace = S_ISDIR(mode) || S_ISREG(mode) || S_ISLNK(mode); > > if (S_ISDIR(mode)) > > start_agno = xfs_ialloc_next_ag(mp); > > else { > > @@ -1635,7 +1756,7 @@ xfs_dialloc( > > > > /* > > * If we have already hit the ceiling of inode blocks then clear > > - * okalloc so we scan all available agi structures for a free > > + * ok_alloc so we scan all available agi structures for a free > > * inode. > > * > > * Read rough value of mp->m_icount by percpu_counter_read_positive, > > @@ -1644,7 +1765,7 @@ xfs_dialloc( > > Er... didn't this logic get split into xfs_dialloc_select_ag in 5.11? Yeah, but that was more about cleaning up the code in xfs_inode.c by separating out the inode initialisation from the physical inode allocation. That cleanup and separation is what allows this series to simplify and clean up the inode allocation because it is no longer commingled with the inode initialisation after allocation... > Nice cleanup, at least... > > :) Yup, along with the 5.11 mods, we've chopped a lot of unnecessary code out of the inode allocation path... :) -Dave. -- Dave Chinner david@xxxxxxxxxxxxx