Re: discard and barriers

Chris Mason <chris.mason@xxxxxxxxxx> · Sat, 14 Aug 2010 11:46:36 -0400

On Sat, Aug 14, 2010 at 04:52:10PM +0200, Christoph Hellwig wrote:
> On Sat, Aug 14, 2010 at 10:14:51AM -0400, Ted Ts'o wrote:
> > Also, to be clear, the block layer will guarantee that a trim/discard
> > of block 12345 will not be reordered with respect to a write block
> > 12345, correct?
> 
> Right now that is what the hardbarrier does, and that's what we're
> trying to get rid of.

So btrfs will wait_on_{page/buffer/bio} to meet all ordering
requirements. This holds both for transaction commit and for discard.
Reiserfs has the exception you already know about.

> For XFS we prevent this by something that is
> called the busy extent list - extents delete by a transaction are
> inserted into it (it's actually a rbtree not a list these days),
> and before we can reuse blocks from it we need to ensure that it
> is fully commited.  discards only happen off that list and extents
> are only removed from it once the discard has finished.  I assume
> other filesystems have a similar mechanism.
> 
> > And on SATA devices, where discard requests are not queued requests,
> > the ata layer will have to do a queue flush *before* the discard is
> > sent, right?

Another way to say this is we have to be 100% sure that if we write
something after a discard, that storage will do that write after it does
the discard.

I'm not actually worried about writes before the discard, because the
worst case for us is the drive fails to discard something it could have
(this is the drive's problem).  Cache flushes from the FS will cover the
case where transaction commits depend on the data going in before the
discard. 

I care a lot about the write after the discards though.  If the discards
themselves become async, that's ok too as long as we have some way to do
end_io processing on them.

-chris

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html