On Thu, Apr 21, 2011 at 09:17:57AM -0500, Martin_Zielinski@xxxxxxxxxx wrote: > > I posted this BUG already on the ext3-users list without response. > After making some new observations I hope, that someone here can > tell me these make sense. Kernel output of the BUG is at the end of > the mail. Hi Martin, Thanks for your observations. I don't necessarily always follow mail sent to ext3-users, but fortunately I saw this note sent to the LKML list. > Here's some debug output that I put into the code: > kernel: (fs/ext3/fsync.c, 77): ext3_sync_file: ext3_sync_file datasync=1 d_tid=27807 tid=27846 > kernel: (fs/jbd/journal.c, 467): log_start_commit: log start commit called with commit request=27845, tid=27807 running transaction=ffff8800266913c0 27846 > > So the "really-commited" transaction id was advancing while this > datasync_tid stayed the same and journal.c - log_start_commit() was > called without waking the commit process. > > I wondered what happens if the current journal tid is overflowing > (32bit unsigned integer). By forcing the tid in get_transaction to > jump close to UINT_MAX, I could reproduce the BUG. A simple overflow shouldn't cause the problem, because of how tid_geq() is coded. However, if there have been 2**31 commits since the fdatasync file has been opened, it's possible to trigger this. That's a **lot** of commits, so I'm not sure I'm completely happy with this theory. Nevertheless, I believe this set of patches (one for ext4, and one for ext3), should prevent the crash from happening. - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html