Re: discard and barriers

"Ted Ts'o" <tytso@xxxxxxx> · Sun, 15 Aug 2010 13:39:06 -0400

On Sat, Aug 14, 2010 at 04:52:10PM +0200, Christoph Hellwig wrote:
> On Sat, Aug 14, 2010 at 10:14:51AM -0400, Ted Ts'o wrote:
> > Also, to be clear, the block layer will guarantee that a trim/discard
> > of block 12345 will not be reordered with respect to a write block
> > 12345, correct?
> 
> Right now that is what the hardbarrier does, and that's what we're
> trying to get rid of.  For XFS we prevent this by something that is
> called the busy extent list - extents delete by a transaction are
> inserted into it (it's actually a rbtree not a list these days),
> and before we can reuse blocks from it we need to ensure that it
> is fully commited.  discards only happen off that list and extents
> are only removed from it once the discard has finished.  I assume
> other filesystems have a similar mechanism.

So ext4 does the transaction commit (which guarantees that the file
delete has hit the disk platterns), and *then* issues the discard, and
*then* we zap the busy extent list.  That's the only safe thing to do,
since if we crash before the transaction gets committed, we lose the
data blocks, so I can't issue the discard until after I wait for
commit block to finish.  This should be the case regardless of
anything we change with respect to how the discard operation works,
since if we discard and then crash before the commit block is written,
data blocks will get lost that should not be discarded.  Am I missing
something?

So after these ordering flush/ordering change that have been proposed,
if the block device layer is free to reorder the discard and a
subsequent write to a discard block, I will need to add a *new* wait
for the discard to complete before I can free the busy extent list.
And this will be true for all file systems that are currently issuing
discards.  Again, am I missing something?

This implies that if the changes to allow the reordering of the
discard and the subsequent writes to the discard blocks goes in
*before* we update all of the filesystems, then there is the potential
for data loss.  And while most file systems don't do discuards by
default, but require some mount option, this still might be considered
undesirable.

So that means we need to add the end-io callbacks to the discard
operations *first*, before we remove the implicit flush/ordering
guarantees.

I thought you were saying that it should be safe to remove the
flush/ordering guarantees in your earlier messages, but this is
leaving me quite confused.  Did I misunderstand you?

					- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html