On Sat, Aug 14, 2010 at 04:52:10PM +0200, Christoph Hellwig wrote: > On Sat, Aug 14, 2010 at 10:14:51AM -0400, Ted Ts'o wrote: > > Also, to be clear, the block layer will guarantee that a trim/discard > > of block 12345 will not be reordered with respect to a write block > > 12345, correct? > > Right now that is what the hardbarrier does, and that's what we're > trying to get rid of. So btrfs will wait_on_{page/buffer/bio} to meet all ordering requirements. This holds both for transaction commit and for discard. Reiserfs has the exception you already know about. > For XFS we prevent this by something that is > called the busy extent list - extents delete by a transaction are > inserted into it (it's actually a rbtree not a list these days), > and before we can reuse blocks from it we need to ensure that it > is fully commited. discards only happen off that list and extents > are only removed from it once the discard has finished. I assume > other filesystems have a similar mechanism. > > > And on SATA devices, where discard requests are not queued requests, > > the ata layer will have to do a queue flush *before* the discard is > > sent, right? Another way to say this is we have to be 100% sure that if we write something after a discard, that storage will do that write after it does the discard. I'm not actually worried about writes before the discard, because the worst case for us is the drive fails to discard something it could have (this is the drive's problem). Cache flushes from the FS will cover the case where transaction commits depend on the data going in before the discard. I care a lot about the write after the discards though. If the discards themselves become async, that's ok too as long as we have some way to do end_io processing on them. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html