On Thu, Jun 12, 2008 at 03:33:53PM +0000, Linus Torvalds wrote: > IOW, there are safety nets in place, but they tend to be fairly easily > broken under certain circumstances. Right, though this is a LAN with really few switches between the clients and the server, and the issue happens with any client with that setup. > Add to the above the possibility of just a kernel NFS bug (or a NFSd one), > and it would really be very interesting to hear: > > - do the errors seem to happen more at certain clients than others? Not really, it happens more often with the script I inlined in one of my mails, than with the one I attached to it. And people working on _pure_ NFS have the issue a bit less than the ones using their workdir on a separate local device. That's all I can tell for now. One of the developper told me that this pattern of use triggers the problem more for him: he has a 'master' branch checkouted in his NFS home, and a 'local' branch on his local hard drive workdir[0]. The issue happens more when he's working on the two workdirs at the same time (for a value of "at the same time" that is like in the same minute, not at the same nanosecond of course, he never commits in both workdirs at the same time). When he only works in his NFS 'master' or the local 'local' branch, it happens really less often. > If it's a client-side problem, it really should happen more for certain > kernel versions or certain hardware. It doesn't afaict. Clients are heterogenous in kernel versions (.18, .22, .24, .25 for whate I've seen), and in hardware (all machines are Dell computers, but from really different years, hence different mobos and NICs. Some even have non Dell gigabit NICs in them). > - have you had any other anecdotal evidence of problems with non-git > usage? Unexplained SIGSEGV's if you have binaries over NFS, for > example? Strange syntax errors when compiling over NFS? Not really no. Our NFS server is remarkably stable with anything else (it's also remarkably slow compared to local drives but that's not really relevant ;p). > I'm not discounting a git bug, but quite frankly, it really is worth > checking that your network/NFS setup is solid. Well to date I'd say it's quite solid. We are a software company and even have tested our software (that does heavy use of mmap, pread, pwrite, and other things that NFS is not often dealing with very well) on that very server without a glitch that could have been attributed to NFS (I mean we've had tons of bugs, but it was always in our software in the end ;p). Though we really rarely pread what we just pwrite-d or things like that, so maybe we never triggered a possible kernel bug either :) [0] the reason of that setup is that when we work on topic branches, we sometimes spot big bugs that are small to fix, and we use the "NFS" master branch to push those bugfixes as soon as we find them, whereas local is pushed only when the feature is ready. -- ·O· Pierre Habouzit ··O madcoder@xxxxxxxxxx OOO http://www.madism.org
Attachment:
pgpiozGX8QqsJ.pgp
Description: PGP signature