On Thu, 2008-06-26 at 22:57 -0400, J. Bruce Fields wrote: > On Thu, Jun 26, 2008 at 04:38:40PM -0700, Junio C Hamano wrote: > > logank@xxxxxxxx writes: > > > > > On Jun 26, 2008, at 1:56 PM, Junio C Hamano wrote: > > > > > >>> "The file shouldn't be short unless someone truncated it, or there > > >>> is a bug in index-pack. Neither is very likely, but I don't think > > >>> we would want to retry pread'ing the same block forever. > > >> > > >> I don't think we would want to retry even once. Return value of 0 > > >> from > > >> pread is defined to be an EOF, isn't it? > > > > > > No, it seems to be a simple error-out in this case. We have 2.4.20 > > > systems with nfs-utils 0.3.3 and used to frequently get the same error > > > while pushing. I made a similar change back in February and haven't > > > had a problem since: > > > > > > diff --git a/index-pack.c b/index-pack.c > > > index 5ac91ba..85c8bdb 100644 > > > --- a/index-pack.c > > > +++ b/index-pack.c > > > @@ -313,7 +313,14 @@ static void *get_data_from_pack(struct > > > object_entry *obj) > > > src = xmalloc(len); > > > data = src; > > > do { > > > + // It appears that if multiple threads read across NFS, the > > > + // second read will fail. I know this is awful, but we wait for > > > + // a little bit and try again. > > > ssize_t n = pread(pack_fd, data + rdy, len - rdy, from + rdy); > > > + if (n <= 0) { > > > + sleep(1); > > > + n = pread(pack_fd, data + rdy, len - rdy, from + rdy); > > > + } > > > if (n <= 0) > > > die("cannot pread pack file: %s", strerror(errno)); > > > rdy += n; > > > > > > I use a sleep request since it seems less likely that the other thread > > > will have an outstanding request after a second of waiting. > > > > Gaah. Don't we have NFS experts in house? Bruce, perhaps? > > Trond, you don't have any idea why a 2.6.9-42.0.8.ELsmp client (2.4.28 > server) might be returning spurious 0's from pread()? > > Seems like everything is happening from that one client--the file isn't > being simultaneously accessed from the server or from another client. Is the file only being read, or could there be a simultaneous write to the same file? I'm surmising this could be an effect resulting from simultaneous cache invalidations: prior to Linux 2.6.20 or so, we weren't rigorously following the VFS/VM rules for page locking, and so page cache invalidation in particular could have some curious side-effects. Cheers Trond -- Trond Myklebust Linux NFS client maintainer NetApp Trond.Myklebust@xxxxxxxxxx www.netapp.com -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html