On Sat, Aug 14, 2010 at 04:52:10PM +0200, Christoph Hellwig wrote: > On Sat, Aug 14, 2010 at 10:14:51AM -0400, Ted Ts'o wrote: > > Also, to be clear, the block layer will guarantee that a trim/discard > > of block 12345 will not be reordered with respect to a write block > > 12345, correct? > > Right now that is what the hardbarrier does, and that's what we're > trying to get rid of. For XFS we prevent this by something that is > called the busy extent list - extents delete by a transaction are > inserted into it (it's actually a rbtree not a list these days), > and before we can reuse blocks from it we need to ensure that it > is fully commited. discards only happen off that list and extents > are only removed from it once the discard has finished. I assume > other filesystems have a similar mechanism. So ext4 does the transaction commit (which guarantees that the file delete has hit the disk platterns), and *then* issues the discard, and *then* we zap the busy extent list. That's the only safe thing to do, since if we crash before the transaction gets committed, we lose the data blocks, so I can't issue the discard until after I wait for commit block to finish. This should be the case regardless of anything we change with respect to how the discard operation works, since if we discard and then crash before the commit block is written, data blocks will get lost that should not be discarded. Am I missing something? So after these ordering flush/ordering change that have been proposed, if the block device layer is free to reorder the discard and a subsequent write to a discard block, I will need to add a *new* wait for the discard to complete before I can free the busy extent list. And this will be true for all file systems that are currently issuing discards. Again, am I missing something? This implies that if the changes to allow the reordering of the discard and the subsequent writes to the discard blocks goes in *before* we update all of the filesystems, then there is the potential for data loss. And while most file systems don't do discuards by default, but require some mount option, this still might be considered undesirable. So that means we need to add the end-io callbacks to the discard operations *first*, before we remove the implicit flush/ordering guarantees. I thought you were saying that it should be safe to remove the flush/ordering guarantees in your earlier messages, but this is leaving me quite confused. Did I misunderstand you? - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html