On Fri, Aug 15, 2008 at 08:02:35AM -0400, Theodore Tso wrote: > On Thu, Aug 14, 2008 at 10:05:48AM +0100, David Woodhouse wrote: > > I'm not sure how to do this for ext[34]. The sb_issue_discard() function > > issues its requests as a soft barrier, because for naïve callers it > > needs to ensure that the discard happens _before_ any subsequent writes > > to the same sectors (if they get reallocated immediately). > > > > But ext[34] can probably do better than that, and submit the discard > > requests _without_ barriers of their own. If someone with a bit more > > clue does it, that is. > > It's worse than this. We can't call sb_issue_discard() until the > transaction commits, since if we crash before the commit, the undelete > will not have happened. (The block/inode bitmaps, inode table, > et. al., aren't allowed to go out to disk until the transaction > commit, and similarly, those sectors aren't allowed to get reused > until the commit happens, as well.) > > This is going to be true of any filesystem which is doing journaling. > What makes life a bit more difficult for ext4 is that we are doing > physical block journaling, so we're not keeping track which blocks are > getting discarded. (In contrast, systems that do logical journaling > are keeping track of specific lists of blocks that are getting freed, > since that's what they write to the journal.) This means we'll have > to keep our own in-memory list of extents for which we should call > sb_issue_discard() when the transaction finally commits. So this is > something that we would have to track in the jbd/jbd2 layer, hanging > off of the transaction structure. If we do this right, it will also > be what OCFS2 can use too (since it uses the jbd layer as well.) Doesn't both ext3 and ext4 do this via ext4_journal_get_undo_access and ext4_mb_free_metadata ?. We actually wait for the transaction to commit to free the meta-data blocks used by the transaction -aneesh -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html