On Tue, 10 Jun 2008, Denis Bueno wrote: > > > Do you have some odd filesystem in play? Was the current corruption in a > > similar environment as the old one? IOW, I'm trying to find a pattern > > here, to see if there might be something we can do about it.. > > I can't remember if the old one happened after a panic or not, but I'd > bet it did. The filesystem is HFS+, as indeed most OS X 10.4 > installations are. Maybe the HD has been going south? However, that > doesn't seem likely, since when I got the computer it was new, and > that was around Jun 2007. Yeah, it's almost certainly not the disk. Disks do go bad, but the behavior tends to be rather different when they do (usually you will get read errors with uncorrectably CRC failures, and you'd know that _very_ clearly). Sure, I could imagine something like the sector remapping could be flaking out on you, but that sounds really unlikely. Especially since: > > But it *sounds* like the objects you lost were literally old ones, no? Ie > > the lost stuff wasn't something you had committed in the last five minutes > > or so? If so, then you really do seem to have a filesystem that corrupts > > *old* files when it crashes. That's fairly scary. What FS is it? > > No, in fact I had just committed those changes not 10 minutes before > the panic. Last time they were also fresh changes, although perhaps > older than 10 minutes. I can't remember. Oh, ok. If so, then this is much less worrisome, and is in fact almost "normal" HFS+ behaviour. It is a journaling filesystem, but it only journals metadata, so the filenames and inodes will be fine after a crash, but the contents will be random. [ Yeah, yeah, I know - it sounds rather stupid, but it's a common kind of stupidity. The journaling essentially protects the only thing that fsck can find. Ext3 does similar things in "writeback" mode - but you should use "data=ordered" which writes out the data before metadata. Basically, such journaling doesn't help data integrity per se, but it does mean that the metadata is ok, and that in turn means that while the file contents won't be dependable, at least things like free block bitmaps etc hopefully are. That in turn hopefully means that new file allocations won't be crapping out all over old ones etc due to bad resource allocations, so while it doesn't mean that the data is trust-worthy, it at least means that you can trust _some_ things ] If your machine crashes often, you could trivially add a "sync" to your commit hook. That would make things better. And maybe we should have a "safe mode" that does these things more carefully. You would definitely want to turn it on on that machine. Are you doing something special to make the machine crash so much? Or do OS X machines always crash, and Apple PR is just so good that people aren't aware of it? Anyway, I'll think about sane ways to add a "safe" mode without making it _too_ painful. In the meantime, here's a trial patch that you should probably use. It does slow things down, but hopefully not too much. (I really don't much like it - but I think this is a good change, and I just need to come up with a better way to do the fsync() than to be totally synchronous about it.) It's going to make big "git add" calls *much* slower, so I'm not very happy about it (especially since we don't actually care that deeply about the files really being there until much later, so doing something asynchronous would be perfectly acceptable), but for you this is definitely worth-while. Linus --- sha1_file.c | 17 +++++++++++------ 1 files changed, 11 insertions(+), 6 deletions(-) diff --git a/sha1_file.c b/sha1_file.c index adcf37c..86a653b 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -2105,6 +2105,15 @@ int hash_sha1_file(const void *buf, unsigned long len, const char *type, return 0; } +/* Finalize a file on disk, and close it. */ +static void close_sha1_file(int fd) +{ + fsync_or_die(fd, "sha1 file"); + fchmod(fd, 0444); + if (close(fd) != 0) + die("unable to write sha1 file"); +} + static int write_loose_object(const unsigned char *sha1, char *hdr, int hdrlen, void *buf, unsigned long len, time_t mtime) { @@ -2170,9 +2179,7 @@ static int write_loose_object(const unsigned char *sha1, char *hdr, int hdrlen, if (write_buffer(fd, compressed, size) < 0) die("unable to write sha1 file"); - fchmod(fd, 0444); - if (close(fd)) - die("unable to write sha1 file"); + close_sha1_file(fd); free(compressed); if (mtime) { @@ -2350,9 +2357,7 @@ int write_sha1_from_fd(const unsigned char *sha1, int fd, char *buffer, } while (1); inflateEnd(&stream); - fchmod(local, 0444); - if (close(local) != 0) - die("unable to write sha1 file"); + close_sha1_file(local); SHA1_Final(real_sha1, &c); if (ret != Z_STREAM_END) { unlink(tmpfile); -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html