From: Theodore Ts'o [mailto:tytso@xxxxxxx] Sent: Thursday, January 03, 2013 9:37 AM To: Staffan Tjernstrom Cc: Steven Rostedt; linux-rt-users@xxxxxxxxxxxxxxx; tglx@xxxxxxxxxxxxx; C.Emde@xxxxxxxxx; jkacur@xxxxxxxxxx Subject: Re: Observed deadlock in ext4 under 3.2.23-rt37 & 3.2.33-rt50 >In fs/jbd2/transaction.c? Can you give me the code snippit and/or function and line number that you're concerned about? Rather in fs/fs.h and fs/namei.c - I think that' where I ended up in my trace of a previous encounter with the issue (either via open() and/or truncate() calls from user land). Coming in from jbd2/transaction.c would make more sense than what I thought I manually traced out however. See http://lxr.linux.no/#linux+v2.6.33.20/fs/namei.c#L325 vs http://lxr.linux.no/#linux+v3.2.33/include/linux/fs.h#L2286 for the change I got suspicious about. >Yeah, but do_get_write_access() blocks (usually waiting for the jbd2 kernel thread to complete, but possibly on a memory allocation); we don't return >EAGAIN or anything like that. So I don't see how that would cause a wait loop. >It's possible we could be returning -ENOMEM; are you looping for all write failures, or just for EAGAIN/EINTR and partial writes? Blocking would make more sense with what I was seeing - the loop around the write() failures deep inside libstdc++'s output stream code may just have been me not manging to navigate that library particularly well. -- To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html