On Fri, 8 Feb 2008 13:49:01 -0500 (EST) "david m. richter" <richterd@xxxxxxxxxxxxxx> wrote: > On Fri, 8 Feb 2008, J. Bruce Fields wrote: > > > On Fri, Feb 08, 2008 at 07:15:02AM -0500, Jeff Layton wrote: > > > On Thu, 7 Feb 2008 18:26:18 -0500 > > > "J. Bruce Fields" <bfields@xxxxxxxxxxxx> wrote: > > > > > > > On Sun, Jan 20, 2008 at 09:58:59AM -0500, Oleg Drokin wrote: > > > > > Hello! > > > > > > > > > > On Jan 18, 2008, at 6:07 PM, J. Bruce Fields wrote: > > > > > > > > > >> On Thu, Nov 29, 2007 at 02:41:57PM -0800, Marc Eshel wrote: > > > > >>> The problem seems to be with the fact that the client and server are > > > > >>> on > > > > >>> the same machine. This test work fine with or without an underlaying > > > > >>> fs > > > > >>> that supports locking when the client and the server are on a > > > > >>> different > > > > >>> machines. Like you said the server is trying to send the grant > > > > >>> message to > > > > >>> the client but for some reason it fails when the client is on the > > > > >>> same > > > > >>> machine. > > > > >> That *shouldn't* make a difference, so we need to take another look at > > > > >> this--Oleg, this problem is still unfixed, right? > > > > > > > > > > Yes, I just pulled your latest nfs tree and I still can reproduce the > > > > > problem. > > > > > > > > OK, we have finally reproduced this problem here, and David's working on > > > > debugging. It does indeed seem to only be reproduceable with client and > > > > server on the same machine. Thanks for the report.... > > > > > > > > --b. > > > > > > It might be worth testing this both with and without the patchset I > > > posted to linux-nfs recently to take care of the lockd hang. If > > > lockd is stuck trying to rpc_ping itself then it probably would hang > > > like this, wouldn't it? > > > > Of course! Yes, that fits. > > > > --b. > > right on, jeff, good catch and thanks for directing my attention > to your patches. > Excellent! Glad that took care of it... > i applied them on top of 2.6.23.1 and tested them on a cluster > exporting GFS2 over NFS, using oleg's reproducer code. your patches fix > that lockd hang. > > in a bit more detail, oleg's reproducer basically gets a > whole-file read lock, tests the lock, upgrades to a whole-file exclusive > lock, tests the lock, then unlocks. the problem was that when getting > that exclusive lock things would hang. this only happened when the client > and server were on the same machine, and i could reproduce it with NFS > exporting GFS2 but not NFS exporting EXT3. > > Interesting. It's not clear me why the underlying filesystem would make any difference there. Though now that I look, it looks like fl_grant really only gets called from dlm code, and that queues up the block for an immediate grant callback attempt. So perhaps that's the reason. -- Jeff Layton <jlayton@xxxxxxxxxx> - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html