RE: Observed deadlock in ext4 under 3.2.23-rt37 & 3.2.33-rt50

Staffan Tjernstrom <stjernstrom@xxxxxxxxxxxxxx> · Thu, 3 Jan 2013 08:29:44 -0600

Thanks for all the prompt responses, very much appreciated.

Since this is occurring on production boxes, we're somewhat constrained in troubleshooting (for now we've re-initialized the disks to ext2 in order to get around the journal commits, although that may not be the actual solution as you suggest).

I'll certainly keep my eyes open and if we see something similar I will see if we can get the extra debug info that Steven suggested in his earlier e-mail.

In the example I caught the process in get_write_access last ran on cpu 0 at SCHED_RR prio 70, the other three (all in jbd2_log_wait_commit) last ran on cpu 6 (the hyperthread opposite cpu 0) at regular prio 0.

-----Original Message-----
From: Steven Rostedt [mailto:rostedt@xxxxxxxxxxx] 
Sent: Thursday, January 03, 2013 7:22 AM
To: Theodore Ts'o
Cc: Staffan Tjernstrom; linux-rt-users@xxxxxxxxxxxxxxx; tglx@xxxxxxxxxxxxx; C.Emde@xxxxxxxxx; jkacur@xxxxxxxxxx
Subject: Re: Observed deadlock in ext4 under 3.2.23-rt37 & 3.2.33-rt50

On Wed, 2013-01-02 at 23:22 -0500, Theodore Ts'o wrote:
> On Wed, Jan 02, 2013 at 10:09:43PM -0500, Steven Rostedt wrote:
> > -- Steve
> > 
> > > Trace 1:
> > > [<ffffffffa01a085d>] jbd2_log_wait_commit+0xcd/0x150 
> > > [<ffffffffa01b74a5>] ext4_sync_file+0x1e5/0x480 
> > > [<ffffffff8117a42b>] vfs_fsync_range+0x2b/0x30 
> > > [<ffffffff8117a44c>] vfs_fsync+0x1c/0x20 [<ffffffff8117a68a>] 
> > > do_fsync+0x3a/0x60 [<ffffffff8117a6c3>] sys_fdatasync+0x13/0x20 
> > > [<ffffffff814e7feb>] system_call_fastpath+0x16/0x1b
> 
> Is this process running at a real-time priority?  If so, it looks like 
> a classic priority inversion problem.  fsync() triggers a journal 
> commit, and then waits for the jbd2 process to do the work.  If you 
> have real-time threads/processes which prevent the jbd2 process from 
> scheduling, that would explain what's going on.
> 
> In general, real-time processes/threads should *not* be doing file 
> system I/O, but if you must, you need to make sure that you've 
> adjusted the jbd2 kernel threads to run at the same or slightly higher 
> priority than the highest priority process which will be writing to 
> the file system.

With -rt things can get worse too. Even caused by kjournald being upped in priority.

Anytime you have something that does the following in order to break lock ordering:

repeat:
	lock(A);
	<do something>
	if (!trylock(B)) {
		unlock(A);
		cpu_relax();
		goto repeat;
	}

We can live lock, because spinlocks in -rt turn into a mutex. Thus, the holder of lock B may not be on another CPU but actually on the current CPU and is waiting for the process that is in this loop. If that process happens to be an RT task, then the system stops.

-- Steve

--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html