Hi Ted: On Tue, Sep 8, 2009 at 12:40 PM, Theodore Tso<tytso@xxxxxxx> wrote: > On Tue, Sep 08, 2009 at 11:21:11AM -0700, Curt Wohlgemuth wrote: >> Hi Valerie: >> >> On Tue, Sep 8, 2009 at 10:56 AM, Valerie Aurora<vaurora@xxxxxxxxxx> wrote: >> > Hey, did you figure this out? If not, I want to have a bug open >> > somewhere. >> >> Yes, sorry. I was going to post a patch for this, but have been >> waiting to verify that it really fixes the issue. And see the thread >> started by Frank Mayhar about fsync issues as well... >> >> The problem is a race, between the last write to a to-be-freed >> metadata block (to update the extent header) and the block being >> marked free in the on-disk/buddy bitmaps. Note that this only happens >> without a journal, since *with* a journal the ordering is done >> correctly. > > Just to clarify, this a race that shows up even without an unclean > shutdown, right? Correct. >> Without a journal, the block buffer_head is written to, the >> buffer_head is marked dirty, and the bitmaps are updated via >> ext4_free_blocks(). In rare cases, the block is re-allocated for >> another inode and written to -- subsequently, the writeback mechanism >> will then flush the dirty extent header back to disk. That's why it >> looks like "leaked extent data" in the data block. > > If this shows up even without an unclean shutdown, then it sounds like > the problem is a missing bforget() call. I looked into this, and it may be merely my ignorance, but I don't see how bforget() would solve the race. All bforget() does is clear the buffer's dirty bit. Meanwhile, the page is still marked dirty, and can be in the middle of writeback; it's true that __block_write_full_page() will check the dirty bit for each buffer in the page, but there doesn't seem to be any synchronization to ensure that the write won't take place at some point in time after bforget() is called. Which means it can be called after the bitmap is changed. This is why I opted to wait for the buffer to be written out before continuing on to ext4_free_blocks(). Am I missing something? Thanks, Curt -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html