Re: Recovering from repository corruption

Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> · Tue, 10 Jun 2008 15:45:05 -0700 (PDT)

On Tue, 10 Jun 2008, Denis Bueno wrote:
> 
> > Do you have some odd filesystem in play? Was the current corruption in a
> > similar environment as the old one? IOW, I'm trying to find a pattern
> > here, to see if there might be something we can do about it..
> 
> I can't remember if the old one happened after a panic or not, but I'd
> bet it did.  The filesystem is HFS+, as indeed most OS X 10.4
> installations are.  Maybe the HD has been going south?  However, that
> doesn't seem likely, since when I got the computer it was new, and
> that was around Jun 2007.

Yeah, it's almost certainly not the disk. Disks do go bad, but the 
behavior tends to be rather different when they do (usually you will get 
read errors with uncorrectably CRC failures, and you'd know that _very_ 
clearly).

Sure, I could imagine something like the sector remapping could be flaking 
out on you, but that sounds really unlikely. Especially since:

> > But it *sounds* like the objects you lost were literally old ones, no? Ie
> > the lost stuff wasn't something you had committed in the last five minutes
> > or so? If so, then you really do seem to have a filesystem that corrupts
> > *old* files when it crashes. That's fairly scary. What FS is it?
> 
> No, in fact I had just committed those changes not 10 minutes before
> the panic.  Last time they were also fresh changes, although perhaps
> older than 10 minutes.  I can't remember.

Oh, ok. If so, then this is much less worrisome, and is in fact almost 
"normal" HFS+ behaviour. It is a journaling filesystem, but it only 
journals metadata, so the filenames and inodes will be fine after a crash, 
but the contents will be random.

[ Yeah, yeah, I know - it sounds rather stupid, but it's a common kind of 
  stupidity. The journaling essentially protects the only thing that fsck 
  can find. Ext3 does similar things in "writeback" mode - but you should 
  use "data=ordered" which writes out the data before metadata.

  Basically, such journaling doesn't help data integrity per se, but it 
  does mean that the metadata is ok, and that in turn means that while the 
  file contents won't be dependable, at least things like free block 
  bitmaps etc hopefully are.

  That in turn hopefully means that new file allocations won't be 
  crapping out all over old ones etc due to bad resource allocations, so 
  while it doesn't mean that the data is trust-worthy, it at least means 
  that you can trust _some_ things ]

If your machine crashes often, you could trivially add a "sync" to your 
commit hook. That would make things better. And maybe we should have a 
"safe mode" that does these things more carefully. You would definitely 
want to turn it on on that machine.

Are you doing something special to make the machine crash so much? Or do 
OS X machines always crash, and Apple PR is just so good that people 
aren't aware of it?

Anyway, I'll think about sane ways to add a "safe" mode without making it 
_too_ painful. In the meantime, here's a trial patch that you should 
probably use. It does slow things down, but hopefully not too much.

(I really don't much like it - but I think this is a good change, and I 
just need to come up with a better way to do the fsync() than to be 
totally synchronous about it.)

It's going to make big "git add" calls *much* slower, so I'm not very 
happy about it (especially since we don't actually care that deeply about 
the files really being there until much later, so doing something 
asynchronous would be perfectly acceptable), but for you this is 
definitely worth-while.

			Linus

---
 sha1_file.c |   17 +++++++++++------
 1 files changed, 11 insertions(+), 6 deletions(-)

diff --git a/sha1_file.c b/sha1_file.c
index adcf37c..86a653b 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -2105,6 +2105,15 @@ int hash_sha1_file(const void *buf, unsigned long len, const char *type,
 	return 0;
 }
 
+/* Finalize a file on disk, and close it. */
+static void close_sha1_file(int fd)
+{
+	fsync_or_die(fd, "sha1 file");
+	fchmod(fd, 0444);
+	if (close(fd) != 0)
+		die("unable to write sha1 file");
+}
+
 static int write_loose_object(const unsigned char *sha1, char *hdr, int hdrlen,
 			      void *buf, unsigned long len, time_t mtime)
 {
@@ -2170,9 +2179,7 @@ static int write_loose_object(const unsigned char *sha1, char *hdr, int hdrlen,
 
 	if (write_buffer(fd, compressed, size) < 0)
 		die("unable to write sha1 file");
-	fchmod(fd, 0444);
-	if (close(fd))
-		die("unable to write sha1 file");
+	close_sha1_file(fd);
 	free(compressed);
 
 	if (mtime) {
@@ -2350,9 +2357,7 @@ int write_sha1_from_fd(const unsigned char *sha1, int fd, char *buffer,
 	} while (1);
 	inflateEnd(&stream);
 
-	fchmod(local, 0444);
-	if (close(local) != 0)
-		die("unable to write sha1 file");
+	close_sha1_file(local);
 	SHA1_Final(real_sha1, &c);
 	if (ret != Z_STREAM_END) {
 		unlink(tmpfile);
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html