Re: [PATCH v2 04/11] xfs: CoW fork operations should only update quota reservations

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Jan 26, 2018 at 08:02:16AM -0500, Brian Foster wrote:
> On Thu, Jan 25, 2018 at 10:20:03AM -0800, Darrick J. Wong wrote:
> > On Thu, Jan 25, 2018 at 08:03:53AM -0500, Brian Foster wrote:
> > > On Wed, Jan 24, 2018 at 05:20:35PM -0800, Darrick J. Wong wrote:
> > > > From: Darrick J. Wong <darrick.wong@xxxxxxxxxx>
> > > > 
> > > > Since the CoW fork only exists in memory, it is incorrect to update the
> > > > on-disk quota block counts when we modify the CoW fork.  Unlike the data
> > > > fork, even real extents in the CoW fork are only reservations (on-disk
> > > > they're owned by the refcountbt) so they must not be tracked in the on
> > > > disk quota info.
> > > > 
> > > > Signed-off-by: Darrick J. Wong <darrick.wong@xxxxxxxxxx>
> > > > ---
> > > > v2: make documentation more crisp and to the point
> > > > ---
> > > >  fs/xfs/libxfs/xfs_bmap.c |  118 ++++++++++++++++++++++++++++++++++++++++++----
> > > >  fs/xfs/xfs_quota.h       |   14 ++++-
> > > >  fs/xfs/xfs_reflink.c     |    8 ++-
> > > >  3 files changed, 122 insertions(+), 18 deletions(-)
> > > > 
> ...
> > > > diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
> > > > index 82abff6..e367351 100644
> > > > --- a/fs/xfs/xfs_reflink.c
> > > > +++ b/fs/xfs/xfs_reflink.c
> > > > @@ -599,10 +599,6 @@ xfs_reflink_cancel_cow_blocks(
> > > >  					del.br_startblock, del.br_blockcount,
> > > >  					NULL);
> > > >  
> > > > -			/* Update quota accounting */
> > > > -			xfs_trans_mod_dquot_byino(*tpp, ip, XFS_TRANS_DQ_BCOUNT,
> > > > -					-(long)del.br_blockcount);
> > > > -
> > > >  			/* Roll the transaction */
> > > >  			xfs_defer_ijoin(&dfops, ip);
> > > >  			error = xfs_defer_finish(tpp, &dfops);
> > > > @@ -795,6 +791,10 @@ xfs_reflink_end_cow(
> > > >  		if (error)
> > > >  			goto out_defer;
> > > >  
> > > > +		/* Charge this new data fork mapping to the on-disk quota. */
> > > > +		xfs_trans_mod_dquot_byino(tp, ip, XFS_TRANS_DQ_BCOUNT,
> > > > +				(long)del.br_blockcount);
> > > > +
> > > 
> > > Should this technically be XFS_TRANS_DQ_DELBCOUNT? The blocks obviously
> > > aren't delalloc and this transaction doesn't make a quota reservation so
> > > I don't think it screws up accounting. But if the transaction did make a
> > > quota reservation, it seems like this would account the extent against
> > > the tx reservation where it instead should recognize that cow blocks
> > > have already been reserved (which is essentially what DELBCOUNT means,
> > > IIUC).
> > 
> > Hmmm, there's a subtlety here -- we're opencoding what DELBCOUNT does,
> > because the subsequent xfs_bmap_del_extent_cow unconditionally reduces
> > the in-core reservation after we've mapped in the extent as if it had
> > been accounted as a real extent all along.  But considering all the
> > blather about how cow fork blocks are treated as incore reservations, it
> > does look funny, doesn't it?
> > 
> 
> Ok.. I missed that the end/del cases were tied together, then reconfused
> myself over the accounting in the end_cow() path (re: our irc chat
> yesterday) when reassessing that bit. So to reset my brain, we have the
> following with this current patch:
> 
> - cow reserve does a delalloc and in-core dquot reservation
> - cow real alloc either skips dquot adjustment if wasdel, else reduces
>   the quota res acquired by the transaction by the size of the alloc[1].
>   Either way we leave around an in-core quota reservation as if the blocks
>   remained delalloc.
> - A cancel at this point simply kills the in-core dquot reservation
>   along with the cow fork blocks.
> - end_cow() unmaps the current data fork blocks and decrements
>   associated real quota usage (tx), remaps the cow blocks and increments
>   real quota usage (tx), then kills off the in-core dquot reservation.

Correct.

> [1] Would this even be necessary if we just acquired a delalloc like
> reservation in xfs_reflink_allocate_cow() rather than associate the
> reservation with the transaction in the first place (assuming we have
> enough information to cover error handling, extent manipulations and
> whatnot)?

Originally cow did make da reservations even for direct writes, but
Christoph thought that we could avoid the overhead of running through
the cow fork an extra time by mapping directly to the cow fork.

> When the tx commits, this essentially has the effect of applying the
> bcount delta to both the on-disk dquot and the in-core res. The former
> reflects the change in the file on-disk and the latter is rectified
> because the field accounts for the current real usage plus outstanding
> reservation. The original cowblocks res has been dropped directly, so
> the bcount delta reflects the change to the data fork.

<nod>

> If we instead use delbcount in end_cow(), we're telling the transaction
> to drop bcount by whatever old data fork blocks were removed and that
> we've converted N delalloc (cow fork, actually) blocks that already had
> in-core reservation. Therefore, transaction commit updates the on-disk
> dquot just the same (-dataforkblocks + delallocblocks), but delbcount
> blocks have already updated the in-core dquot res so the transaction has
> nothing else to do there (and so we must also not remove that
> reservation in del_cow()). This approach does seem like it requires a
> bit less mental gymnastics to follow because it more closely resembles
> delalloc quota accounting. ;)

Yes, that's less brain muddling; last night's patchpile incorporates
that.

> Another thing that I'm not sure has been considered here is whether
> doing the bcount delta in the transaction and dropping the cowblocks res
> from the dquot directly leaves a race window where the quota can overrun
> a limit. E.g., since the transaction has to up the in-core res in the
> original example at commit time, is there anything that locks out
> further external reservation from the dquot between the time the in-core
> res is dropped and the transaction commits?

Yes, that's a theoretical race (as in I've never seen it happen) that
is fixed by using delbcount in end_cow.

> > So perhaps the solution is to pass intent into xfs_bmap_del_extent_cow:
> > if we're calling it from _end_cow then we want to hang on to the
> > reservation so that delbcount can do its thing, but if we're calling
> > from _cancel_cow then we're dumping the extent and reservation.
> > 
> 
> Indeed. But since those are the only callers and we'd already update
> delbcount from end_cow(), could we not just lift the del_cow() decrement
> into the cancel_cow() function? FWIW, some extra comments around quota
> manipulation in the reflink functions would also be useful for future
> reference.

Hm, yes, could do that too.

TBH I had the moment of "doh, just call the quota unreserve in
cancel_cow directly instead of at the end of del_extent_cow" right after
I hit send. :(

--D

> Brian
> 
> > --D
> > 
> > > 
> > > Other than that the code seems Ok to me.
> > > 
> > > Brian
> > > 
> > > >  		/* Remove the mapping from the CoW fork. */
> > > >  		xfs_bmap_del_extent_cow(ip, &icur, &got, &del);
> > > >  
> > > > --
> > > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > > > the body of a message to majordomo@xxxxxxxxxxxxxxx
> > > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > > the body of a message to majordomo@xxxxxxxxxxxxxxx
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@xxxxxxxxxxxxxxx
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux