On Tue 23-10-12 19:57:09, Eric Sandeen wrote: > On 10/23/12 5:19 PM, Theodore Ts'o wrote: > > On Tue, Oct 23, 2012 at 09:57:08PM +0100, Nix wrote: > >> > >> It is now quite clear that this is a bug introduced by one or more of > >> the post-3.6.1 ext4 patches (which have all been backported at least to > >> 3.5, so the problem is probably there too). > >> > >> [ 60.290844] EXT4-fs error (device dm-3): ext4_mb_generate_buddy:741: group 202, 1583 clusters in bitmap, 1675 in gd > >> [ 60.291426] JBD2: Spotted dirty metadata buffer (dev = dm-3, blocknr = 0). There's a risk of filesystem corruption in case of system crash. > >> > > > > I think I've found the problem. I believe the commit at fault is commit > > 14b4ed22a6 (upstream commit eeecef0af5e): > > > > jbd2: don't write superblock when if its empty > > > > which first appeared in v3.6.2. > > > > The reason why the problem happens rarely is that the effect of the > > buggy commit is that if the journal's starting block is zero, we fail > > to truncate the journal when we unmount the file system. This can > > happen if we mount and then unmount the file system fairly quickly, > > before the log has a chance to wrap.After the first time this has > > happened, it's not a disaster, since when we replay the journal, we'll > > just replay some extra transactions. But if this happens twice, the > > oldest valid transaction will still not have gotten updated, but some > > of the newer transactions from the last mount session will have gotten > > written by the very latest transacitons, and when we then try to do > > the extra transaction replays, the metadata blocks can end up getting > > very scrambled indeed. > > I'm stumped by this; maybe Ted can see if I'm missing something. > > (and Nix, is there anything special about your fs? Any nondefault > mkfs or mount options, external journal, inordinately large fs, or > anything like that?) > > The suspect commit added this in jbd2_mark_journal_empty(): > > /* Is it already empty? */ > if (sb->s_start == 0) { > read_unlock(&journal->j_state_lock); > return; > } > > thereby short circuiting the function. > > But Ted's suggestion that mounting the fs, doing a little work, and > unmounting before we wrap would lead to this doesn't make sense to > me. When I do a little work, s_start is at 1, not 0. We start > the journal at s_first: > > load_superblock() > journal->j_first = be32_to_cpu(sb->s_first); > > And when we wrap the journal, we wrap back to j_first: > > jbd2_journal_next_log_block(): > if (journal->j_head == journal->j_last) > journal->j_head = journal->j_first; > > and j_first comes from s_first, which is set at journal creation > time to be "1" for an internal journal. > > So s_start == 0 sure looks special to me; so far I can only see that > we get there if we've been through jbd2_mark_journal_empty() already, > though I'm eyeballing jbd2_journal_get_log_tail() as well. > > Ted's proposed patch seems harmless but so far I don't understand > what problem it fixes, and I cannot recreate getting to > jbd2_mark_journal_empty() with a dirty log and s_start == 0. Agreed. I rather thing we might miss journal->j_flags |= JBD2_FLUSHED when shortcircuiting jbd2_mark_journal_empty(). But I still don't exactly see how that would cause the corruption... Honza -- Jan Kara <jack@xxxxxxx> SUSE Labs, CR -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html