On Thu, 11 Jan 2007, Michael S. Tsirkin wrote: > > OK, now that I know what happened, this makes sense to me. > But I'd like to understand whether what created this in the first place: > > > > > - you had gotten some corrupt objects due to the disk filling up > > (probably during the pull that thus populated the object database with > > partially written objects) > > > > In particular, the 4d4d30be967d3284cbf59afd4fba6ab536e295f5 object was > > corrupt. fsck gave a confusing error message: > > > > error: 4d4d30be967d3284cbf59afd4fba6ab536e295f5: object not found > > error: c03590b581d51d5fa43adbef9415e935d0229412: object not found > > > > which is really because the _file_ for that object does exist, but the > > file doesn't actually contain the object it expects (due to > > corruption), so the object wasn't "found". > > is something expected? Well, it's a bug, and it's not "expected", but I think it's understood. What happens is that you pulled objects that were perfectly fine, and when you ran out of diskspace git could create the _filenames_ and inodes for them, but when it actually wrote the data itself, it failed. And it failed in a very unfortunate place: when unpacking the pack-file (which you got when doing the "git pull"), it would call "write_sha1_file()" to actually write the unpacked object. That one does everything correctly and is actually very careful, but then it does if (write_buffer(fd, compressed, size) < 0) die("unable to write sha1 file"); which _should_ have caught the case that the buffer couldn't be written, and died cleanly with an error message, and it would neve rhave moved the temporary file that it wrote to into the real database. In other words, it was doing everything right. A partial file would never have actually been allowed to be moved into the database at all, and you'd have gotten a nice error message, and the "git pull" would have failed gracefully. However, "write_buffer()" itself was buggy. It did: static int write_buffer(int fd, const void *buf, size_t len) { ssize_t size; size = write_in_full(fd, buf, len); if (!size) return error("file write: disk full"); if (size < 0) return error("file write error (%s)", strerror(errno)); return 0; } and the problem here is that if the write was _partial_ (not totally empty), it would see a positive size, but not a full write, and it would incorrectly return zero for "everything is fine". Even though it wasn't. So my patch to make "write_in_full()" have better semantics (and make it simpler too) should have fixed the fundamental problem that caused your failure in the first place. The other patch I just sent out should just make the error messages be a bit nicer, and isn't important in itself. Btw, that "write_buffer()" bug was introduced pretty recently. It used to be that write_buffer() did the right thing: it used to do: while (len) { ssize_t size; size = write(fd, buf, len); if (!size) return error("file write: disk full"); if (size < 0) { if (errno == EINTR || errno == EAGAIN) continue; return error("file write error (%s)", strerror(errno)); } len -= size; buf = (char *) buf + size; } which is safe and does the right thing for partial writes and disk full errors (of course it is: I wrote that piece of code). So this bug was _literally_ introduced three days ago, and it was introduced by a set of patches that tried to _avoid_ the problem. Sad. The old code was perfectly good. The new code was unsafe, but thought it was better. See commit 93822c2239a336e5cb583549071c59202ef6c5b2 for details. Junio, pls apply my "fix write_in_full" patch asap if you haven't already. I find it very irritating when people "clean up" my code, and introduce bugs when they do so. Linus - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html