[Hmm, nobody seems ot have commented on this analysis; maybe reposting it with a subject containing [BUG] will help?] Samuel Bronson <naesten@xxxxxxxxx> writes: > The following message is a courtesy copy of an article > that has been posted to gmane.comp.version-control.git as well. > > Oh, I forgot to provide any analysis of the problem. Oops. > > It may be just as well, though; I was tired enough that I might have > botched it in any case. So, have an analysis: > > While inflate errors are obviously NOT GOOD, and should perhaps be > instantly fatal for most commands, git fsck is something of a special > case because it is useful to have *it* report as many corrupt objects as > possible in one run. > > Unfortunately, this is not currently the case, as shown by the provided > testcase. > > The output for this testcase is: > > ,---- > | checking known breakage: > | hash1=ffffffffffffffffffffffffffffffffffffffff && > | hash2=fffffffffffffffffffffffffffffffffffffffe && > | mkdir -p .git/objects/ff && > | echo not-zlib >$(sha1_file $hash1) && > | test_when_finished "remove_object $hash1" && > | echo not-zlib >$(sha1_file $hash2) && > | test_when_finished "remove_object $hash2" && > | > | # Return value is not documented > | test_might_fail git fsck 2>out && > | cat out && echo ====== && > | grep "$hash1.*corrupt" out && > | grep "$hash2.*corrupt" out > | > | error: inflate: data stream error (incorrect header check) > | error: unable to unpack ffffffffffffffffffffffffffffffffffffffff header > | error: inflate: data stream error (incorrect header check) > | fatal: loose object ffffffffffffffffffffffffffffffffffffffff (stored > | in .git/objects/ff/ffffffffffffffffffffffffffffffffffffff) is > | corrupt > | ====== > | fatal: loose object ffffffffffffffffffffffffffffffffffffffff (stored > | in .git/objects/ff/ffffffffffffffffffffffffffffffffffffff) is > | corrupt > | not ok 5 - fsck survives inflate errors # TODO known breakage > `---- > > If I flip it from expect_failure to expect_success and run the test with > -i, then go into the trash directory and run "gdb ../../git-fsck", I can > obtain this (thoroughly rehearsed & trimmed) gdb transcript: > > ,---- > | % gdb ../../git-fsck > | GNU gdb (Debian 7.7.1-3) 7.7.1 > ... > | Reading symbols from ../../git-fsck...done. > | (gdb) break error > | Breakpoint 1 at 0x813d24c: file usage.c, line 143. > | (gdb) break die > | Breakpoint 2 at 0x813d152: file usage.c, line 94. > | (gdb) run > | Starting program: /home/naesten/hacking/git/git-fsck > | [Thread debugging using libthread_db enabled] > | Using host libthread_db library > | "/lib/i386-linux-gnu/i686/cmov/libthread_db.so.1". > | Checking object directories: 100% (256/256), done. > | > | Breakpoint 1, error (err=0x8182f7a "inflate: %s (%s)") at usage.c:143 > | 143 { > | (gdb) bt > | #0 error (err=0x8182f7a "inflate: %s (%s)") at usage.c:143 > | #1 0x081452ff in git_inflate (strm=0xbfffe6b8, flush=0) > | at zlib.c:144 > | #2 0x08125367 in unpack_sha1_header (stream=<optimized out>, > | map=<optimized out>, mapsize=<optimized out>, > | buffer=<optimized out>, bufsiz=<optimized out>) > | at sha1_file.c:1515 > | #3 0x08125546 in sha1_loose_object_info ( > | sha1=sha1@entry=0x82659d4 '\377' <repeats 20 times>, > | oi=oi@entry=0xbfffe788) at sha1_file.c:2528 > | #4 0x08126b2d in sha1_object_info_extended ( > | sha1=0x82659d4 '\377' <repeats 20 times>, oi=0xbfffe788, flags=1) > | at sha1_file.c:2565 > | #5 0x0812666f in sha1_object_info ( > | sha1=0x82659d4 '\377' <repeats 20 times>, sizep=0x0) > | at sha1_file.c:2601 > | #6 0x080f6941 in parse_object ( > | sha1=0x82659d4 '\377' <repeats 20 times>) at object.c:247 > | #7 0x080758ac in fsck_sha1 ( > | sha1=sha1@entry=0x82659d4 '\377' <repeats 20 times>) > | at builtin/fsck.c:333 > ... > | (gdb) c > | Continuing. > | error: inflate: data stream error (incorrect header check) > | > | Breakpoint 1, error (err=0x817c525 "unable to unpack %s header") > | at usage.c:143 > | 143 { > | (gdb) bt > | #0 error (err=0x817c525 "unable to unpack %s header") at usage.c:143 > | #1 0x08125564 in sha1_loose_object_info ( > | sha1=sha1@entry=0x82659d4 '\377' <repeats 20 times>, > | oi=oi@entry=0xbfffe788) at sha1_file.c:2529 > | #2 0x08126b2d in sha1_object_info_extended ( > | sha1=0x82659d4 '\377' <repeats 20 times>, oi=0xbfffe788, flags=1) > | at sha1_file.c:2565 > | #3 0x0812666f in sha1_object_info ( > | sha1=0x82659d4 '\377' <repeats 20 times>, sizep=0x0) > | at sha1_file.c:2601 > | #4 0x080f6941 in parse_object ( > | sha1=0x82659d4 '\377' <repeats 20 times>) at object.c:247 > ... > | (gdb) frame 4 > | #4 0x080f6941 in parse_object ( > | sha1=0x82659d4 '\377' <repeats 20 times>) at object.c:247 > | warning: Source file is more recent than executable. > | 247 sha1_object_info(sha1, NULL) == OBJ_BLOB)) { // <-- first error > | (gdb) down > | #3 0x0812666f in sha1_object_info ( > | sha1=0x82659d4 '\377' <repeats 20 times>, sizep=0x0) > | at sha1_file.c:2601 > | 2601 if (sha1_object_info_extended(sha1, &oi, LOOKUP_REPLACE_OBJECT) > | < 0) > | (gdb) fin > | Run till exit from #3 0x0812666f in sha1_object_info ( > | sha1=0x82659d4 '\377' <repeats 20 times>, sizep=0x0) > | at sha1_file.c:2601 > | error: unable to unpack ffffffffffffffffffffffffffffffffffffffff header > | parse_object (sha1=0x82659d4 '\377' <repeats 20 times>) > | at object.c:246 > | 246 (!obj && has_sha1_file(sha1) && > | Value returned is $1 = -1 > | (gdb) c > | Continuing. > | > | Breakpoint 1, error (err=0x8182f7a "inflate: %s (%s)") at usage.c:143 > | 143 { > | (gdb) bt > | #0 error (err=0x8182f7a "inflate: %s (%s)") at usage.c:143 > | #1 0x081452ff in git_inflate (strm=0xbfffe710, flush=0) > | at zlib.c:144 > | #2 0x08125367 in unpack_sha1_header (stream=<optimized out>, > | map=<optimized out>, mapsize=<optimized out>, > | buffer=<optimized out>, bufsiz=<optimized out>) > | at sha1_file.c:1515 > | #3 0x08125429 in unpack_sha1_file (map=map@entry=0xb7fdc000, > | mapsize=<optimized out>, type=type@entry=0xbfffe7d8, > | size=0xbfffe7dc, sha1=0x82659d4 '\377' <repeats 20 times>) > | at sha1_file.c:1620 > | #4 0x08126024 in read_object ( > | sha1=sha1@entry=0x82659d4 '\377' <repeats 20 times>, > | type=type@entry=0xbfffe7d8, size=size@entry=0xbfffe7dc) > | at sha1_file.c:2667 > | #5 0x08126c33 in read_sha1_file_extended ( > | sha1=0x82659d4 '\377' <repeats 20 times>, type=0xbfffe7d8, > | size=0xbfffe7dc, flag=1) at sha1_file.c:2690 > | #6 0x080f69b8 in read_sha1_file (size=0xbfffe7dc, type=0xbfffe7d8, > | sha1=0x82659d4 '\377' <repeats 20 times>) at cache.h:853 > | #7 parse_object (sha1=0x82659d4 '\377' <repeats 20 times>) > | at object.c:256 > ... > | (gdb) c > | Continuing. > | error: inflate: data stream error (incorrect header check) > | > | Breakpoint 2, die ( > | err=0x817cdbc "loose object %s (stored in %s) is corrupt") > | at usage.c:94 > | 94 { > | (gdb) bt > | #0 die (err=0x817cdbc "loose object %s (stored in %s) is corrupt") > | at usage.c:94 > | #1 0x08126cc1 in read_sha1_file_extended ( > | sha1=0x82659d4 '\377' <repeats 20 times>, type=0xbfffe7d8, > | size=0xbfffe7dc, flag=1) at sha1_file.c:2705 > | #2 0x080f69b8 in read_sha1_file (size=0xbfffe7dc, type=0xbfffe7d8, > | sha1=0x82659d4 '\377' <repeats 20 times>) at cache.h:853 > | #3 parse_object (sha1=0x82659d4 '\377' <repeats 20 times>) > | at object.c:256 > ... > | (gdb) frame 3 > | #3 parse_object (sha1=0x82659d4 '\377' <repeats 20 times>) > | at object.c:256 > | 256 buffer = read_sha1_file(sha1, &type, &size); // <-- death > | (gdb) > `---- > > It's probably clearest what's happened here if I just show you this ... > > From object.c: > > struct object *parse_object(const unsigned char *sha1) > { > unsigned long size; > enum object_type type; > int eaten; > const unsigned char *repl = lookup_replace_object(sha1); > void *buffer; > struct object *obj; > > obj = lookup_object(sha1); > if (obj && obj->parsed) > return obj; > > if ((obj && obj->type == OBJ_BLOB) || > (!obj && has_sha1_file(sha1) && > sha1_object_info(sha1, NULL) == OBJ_BLOB)) { // <-- first error > if (check_sha1_signature(repl, NULL, 0, NULL) < 0) { > error("sha1 mismatch %s", sha1_to_hex(repl)); > return NULL; > } > parse_blob_buffer(lookup_blob(sha1), NULL, 0); > return lookup_object(sha1); > } > > buffer = read_sha1_file(sha1, &type, &size); // <-- death > if (buffer) { > if (check_sha1_signature(repl, buffer, size, typename(type)) < 0) { > free(buffer); > error("sha1 mismatch %s", sha1_to_hex(repl)); > return NULL; > } > > obj = parse_object_buffer(sha1, type, size, buffer, &eaten); > if (!eaten) > free(buffer); > return obj; > } > return NULL; > } > > (I've clearly added some marginal comments to my copy. ;-) > > The first two lines of output > >> error: inflate: data stream error (incorrect header check) >> error: unable to unpack ffffffffffffffffffffffffffffffffffffffff header > > come from deep within the call "sha1_object_info(sha1, NULL)". > > The next two lines > >> error: inflate: data stream error (incorrect header check) >> fatal: loose object ffffffffffffffffffffffffffffffffffffffff (stored >> in .git/objects/ff/ffffffffffffffffffffffffffffffffffffff) is >> corrupt > > and death come from the call "read_sha1_file(sha1, &type, &size)", which > is fair because that's exactly what the documentation comment above > read_sha1_file_extended() *says* will happen: > > /* > * This function dies on corrupt objects; the callers who want to > * deal with them should arrange to call read_object() and give error > * messages themselves. > */ > > So, given that parse_object()'s documentation is: > > /* > * Returns the object, having parsed it to find out what it is. > * > * Returns NULL if the object is missing or corrupt. > */ > > it probably should not call read_sha1_file() on a corrupt object. > > Options for fixing this would appear to include: > > 1. Saving the result of sha1_object_info(sha1, NULL) to a variable and > returning early if the object is corrupt. (But what happens if there > is corruption far enough in that it isn't seen when trying to grab > the object header?) > > 2. Calling read_object() and giving our own error messages. > > 3. Making read_sha1_file_extended only *optionally* die; since it's > calling die() directly. > > I'm also a bit nervous about this, though: > > static inline void *read_sha1_file(const unsigned char *sha1, enum object_type *type, unsigned long *size) > { > return read_sha1_file_extended(sha1, type, size, LOOKUP_REPLACE_OBJECT); > } > > Do we really want that happening while scanning the loose objects? -- Hi! I'm a .signature virus! Copy me into your ~/.signature to help me spread! -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html