Re: tr_ifree transaction log reservation calculation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Nov 23, 2017 at 09:36:53AM -0500, Brian Foster wrote:
> On Thu, Nov 23, 2017 at 11:24:02AM +1100, Dave Chinner wrote:
> > On Wed, Nov 22, 2017 at 03:41:26PM -0500, Brian Foster wrote:
> > > On Wed, Nov 22, 2017 at 07:21:01AM -0500, Brian Foster wrote:
> > > > On Wed, Nov 22, 2017 at 01:26:45PM +1100, Dave Chinner wrote:
> > > > > On Tue, Nov 21, 2017 at 12:35:57PM -0500, Brian Foster wrote:
> > > > > I'm still not 100% sure what the extra "2 +" is accounting for
> > > > > there, though, so to be safe that migh need to be accounted as full
> > > > > blocks, not log headers.
> > > > > 
> > > > 
> > > > Same.. there is still a bit of cruft in this calculation that really
> > > > isn't clear. That +2 is one part and then there's another
> > > > xfs_calc_buf_res(1, 0) right before the m_ialloc_blks line. The fact
> > > > that it's a separate line implies it is logically separate, but who
> > > > knows.
> > > > 
> > > > I suppose I'll keep that line as is, replace the inobt entry bit with
> > > > the full inobt calculation (as above), keep the +2 associated with the
> > > > inobt calculation and run some tests on that.
> > 
> > I think I've worked out what the +2 is supposed to be - the AGF
> > and AGFL when we alloc/free blocks for either the inode chunk
> > or inobt records. You see them in, say, xfs_calc_write_reservation()
> > but not in the inode allocation reservations, despite needing
> > to do allocation....
> > 
> 
> That sounds sane.. though it looks like in the case of
> xfs_calc_write_reservation() those are accounted as sectors rather than
> full blocks as in the ifree calculation. I suppose we should fix those
> up in the ifree calculation to reflect the sector size and properly
> document them.
> 
> Hmm, xfs_calc_write_reservation() also handles the space freeing bits a
> bit more how I would expect: using the max of two calculations since a
> part of the operation occurs after a transaction roll. The ifree
> calculation just adds everything together (and we still overrun it :/).

Well, I don't think that's going to matter - more below...

[....]

> > So if we want to cover the worst case, we need a reservation for
> > each freespace tree modification, not one to cover them all. i.e.
> > one for each AGFL modification, and one for the actual extent being
> > allocated/freed. And we need one of those reservations for
> > *every single allocation/free that can occur in a transaction*.
> > 
> > This rapidly blows out to being impractical. We're not going to get
> > full height splits/joins on every single allocation that occurs in a
> > transaction, and the number of potential allocations in a
> > transaction can be considered essentially unbound. e.g.  how many
> > allocations can a single a directory modification do? Hash btree
> > split, data/node btree split,  new freespace block, too - especially
> > when considering that a each directory block could - in the worse
> > case - be 128 x 512 byte extents (64k dir block size on 512 byte
> > block size filesystem).
> > 
> 
> The impression I get is that at the time of designing these transactions
> it was simply easiest to characterize worst case of a single operation
> as a full btree modification, and then either there was an assumption
> made that would be sufficient to cover other multiple allocation/free
> cases (since the full tree changes are relatively rare) or that was just
> borne out in practice and so was never really revisited. 

Yup, that seems to fit the pattern over time of "increase the
reservation when it runs out" fixes....

> > So, no, I don't think it's realistic to start trying to cater for
> > worst case reservations here. Such crazy shit just doesn't happen
> > often enough in the real world for us to penalise everything with
> > such reservations.
> > 
> 
> Agreed.
> 
> > In which case, I think we need to consider that each major operation
> > (inobt, directory, finobt, etc) probably should have it's own
> > allocation reservation. It's extremely unlikely that all of them are
> > going to overrun in the same transaction, so maybe this is a good
> > enough safe guard without going completely over the top...
> > 
> 
> For the ifree example, I take this to mean we should refactor the
> calculation to consider freeing of the inode chunk separate from the
> inobt and finobt modifications with respect to free space tree
> modifications.  In other words, we account one allocfree res for the
> inode chunk, another for the inobt modification (to effectively cover
> agfl fixups) and another (optional) for the finobt as opposed to one for
> the whole transaction. Is that what you're suggesting?
> 
> That sounds reasonable, but at the same time I'm a bit concerned that I
> think the ifree transaction should technically be refactored into a max
> of the pre/post tx roll reservations similar to the write reservation
> mentioned above. For example:
> 
> 	MAX(inode + inobt + finobt + agi + inode chunk + etc.,
> 	    allocfree for inode chunk + (allocfree * nr inobts));

Yeah, I think you're right in that we need to we need to look at
redefining the reservations at a higher level.

For this specific example, though, we have to take into account that
a finobt modification to mark an inode free can insert a new record
into the btree. Hence we have scope for a finobt block allocation
and tree split, so we need to have allocfree reservations in the
primary transaction.

Also, the inode btree modifications (and refcountbt, too) call
xfs_free_extent() directly - they don't use deferops free list that
is processed as an EFI/EFD op after the primary transaction has been
rolled.  i.e:

xfs_btree_delrec
  xfs_btree_free_block
    xfs_inobt_free_block
      xfs_free_extent

Hence the primary transaction reservation has to take into account
that operations that remove a record from the tree can cause and
extent to be freed. As such, the removal of an inode chunk - which
removes the record from both the inobt and finobt - can trigger
freespace tree modification directly.

IOWs, the primary ifree transaction needs alloc/free reservations
for the modifications of both the inobt and the finobt, regardless
of anything else that goes on, so.....

> But that would end up making the transaction smaller than adding it all
> together. Of course, that may be fine with the agfl fixup below..

.... I doubt it will shrink if we take into account the multiple
direct alloc/free operations in the primary transaction properly. :/

Hmmmm - I'm wondering if we need to limit the max extents in a
defer-ops EFI/EFD item to 2 so we don't try to free all the defered
extent frees in a single EFI/EFD transaction pair. i.e. force it
to roll the transaction and get a new log reservation every two
extents being freed so we can limit the reservation size defered
extent freeing requires....

Cheers,

Dave.

-- 
Dave Chinner
david@xxxxxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux