RE: 2.6.32 ext3 assertion j_running_transaction != NULL fails in commit.c

<Martin_Zielinski@xxxxxxxxxx> · Tue, 26 Apr 2011 07:45:33 -0500

I will port the jbd2 debugging code to jbd an will try to get the new kernel into production.
After a reboot we will have to wait several weeks. (Strange: all machines failed within 72 hours).

With sqlite I can currently produce ~10.000.000 commits in one hour with a program that does nothing else.
I doubt that it is possible to have an overflow in such a short time that we are observing.
Maybe the __log_start_commit commit call comes with a corrupt target id from elsewhere. But your patch will catch that, too.

Cheers,
Martin

-----Original Message-----
From: Ted Ts'o [mailto:tytso@xxxxxxx] 
Sent: Dienstag, 26. April 2011 14:24
To: Zielinski, Martin
Cc: linux-ext4@xxxxxxxxxxxxxxx
Subject: Re: 2.6.32 ext3 assertion j_running_transaction != NULL fails in commit.c

On Tue, Apr 26, 2011 at 04:07:11AM -0500, Martin_Zielinski@xxxxxxxxxx wrote:
> Ted!
> Thank you a lot!
> We observed this bug on ~10 out of 40 machines after an uptime from about 3 weeks. All run under comparable conditions.
> I will have a closer look on the debugfs output to verify if the situation can  happen within this short time range. Additionally we installed a crash kernel and I changed the BUG into a panic(). 
> So I will be able to look at the journal structure if this happens again.

If you would be willing to install the debugging code that is in the
jbd2 patch into the jbd patch, and this put this on your production
machines, that would be really great.  I can send you a revised jbd
patch if that would help (the debugging in code in jbd2 should move
over to the jbd patch really simply).

							- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html