RE: Observed deadlock in ext4 under 3.2.23-rt37 & 3.2.33-rt50

Staffan Tjernstrom <stjernstrom@xxxxxxxxxxxxxx> · Thu, 3 Jan 2013 09:52:06 -0600

From: Theodore Ts'o [mailto:tytso@xxxxxxx] 
Sent: Thursday, January 03, 2013 9:37 AM
To: Staffan Tjernstrom
Cc: Steven Rostedt; linux-rt-users@xxxxxxxxxxxxxxx; tglx@xxxxxxxxxxxxx; C.Emde@xxxxxxxxx; jkacur@xxxxxxxxxx
Subject: Re: Observed deadlock in ext4 under 3.2.23-rt37 & 3.2.33-rt50

>In fs/jbd2/transaction.c?  Can you give me the code snippit and/or function and line number that you're concerned about?

Rather in fs/fs.h and fs/namei.c - I think that' where I ended up in my trace of a previous encounter with the issue (either via open() and/or truncate() calls from user land). Coming in from jbd2/transaction.c would make more sense than what I thought I manually traced out however.

See http://lxr.linux.no/#linux+v2.6.33.20/fs/namei.c#L325 vs http://lxr.linux.no/#linux+v3.2.33/include/linux/fs.h#L2286 for the change I got suspicious about.

>Yeah, but do_get_write_access() blocks (usually waiting for the jbd2 kernel thread to complete, but possibly on a memory allocation); we don't return >EAGAIN or anything like that.  So I don't see how that would cause a wait loop.

>It's possible we could be returning -ENOMEM; are you looping for all write failures, or just for EAGAIN/EINTR and partial writes?

Blocking would make more sense with what I was seeing - the loop around the write() failures deep inside libstdc++'s output stream code may just have been me not manging to navigate that library particularly well.
--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html