Re: "failed to read delta base object at..."

Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> · Wed, 27 Aug 2008 12:48:00 -0700 (PDT)

On Wed, 27 Aug 2008, Nicolas Pitre wrote:
> 
> And isn't the bad data block size and alignment a bit odd for a 
> filesystem crash corruption?

Yes. If it was a filesystem issue, I'd expect it to be at least disk block 
aligned (512 bytes, most of the time) and more likely filesystem block 
aligned (ie mostly 4kB).

However, if we were to re-write the file afterwards, it could still get 
non-block-aligned corruption - simply because there was a 
non-block-aligned rewrite that got lost. But we don't actually ever do 
that, except for the header and the SHA1 at the end in some unusual cases.

> However, in the pack-objects case, it is almost impossible to have such 
> a corruption since the data is SHA1 summed immediately before being 
> written out.

Yes. Anything that uses the "sha1write()" model (which includes the 
regular pack-file _and_ the index) should generally be pretty safe. 

However, we do have this odd case of fixing up the pack after-the-fact 
when we receive it from somebody else (because we get a thin pack and 
don't know how many objects the final result will have). And that case 
seems to be not as safe, because it

 - re-reads the file to recompute the SHA1

   This is understandable, and it's fairly ok, but it does mean that there 
   is a bigger chance of the SHA1 matching if something has corrupted the 
   file in the meantime!

   (That was not the case of this corruption, obviously, since the SHA1 
   didn't match)

 - but it also forgets to fsync the result, because it only did that in 
   one path rather in all cases of fixup.

   Again, this wasn't actually the cause of this corruption, because the 
   corruption wasn't near the header or tail, so if it had been due to a 
   missed write due to missing an fsync, the pattern would have been 
   different.

Anyway, we should fix the latter problem regardless, even if it's (a) damn 
unlikely and (b) definietly not the case in this thing.

The fix is trivial - just move the "fsync_or_die()" into the fixup routine 
rather than doing it in one of the callers.

Signed-off-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>

		Linus

---
 builtin-pack-objects.c |    1 -
 pack-write.c           |    1 +
 2 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/builtin-pack-objects.c b/builtin-pack-objects.c
index 2dadec1..d394c49 100644
--- a/builtin-pack-objects.c
+++ b/builtin-pack-objects.c
@@ -499,7 +499,6 @@ static void write_pack_file(void)
 		} else {
 			int fd = sha1close(f, NULL, 0);
 			fixup_pack_header_footer(fd, sha1, pack_tmp_name, nr_written);
-			fsync_or_die(fd, pack_tmp_name);
 			close(fd);
 		}
 
diff --git a/pack-write.c b/pack-write.c
index a8f0269..ddcfd37 100644
--- a/pack-write.c
+++ b/pack-write.c
@@ -179,6 +179,7 @@ void fixup_pack_header_footer(int pack_fd,
 
 	SHA1_Final(pack_file_sha1, &c);
 	write_or_die(pack_fd, pack_file_sha1, 20);
+	fsync_or_die(pack_fd, pack_name);
 }
 
 char *index_pack_lockfile(int ip_out)
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html