On Mon, Feb 27, 2017 at 10:59:15PM -0800, Linus Torvalds wrote: > I saw that somebody is actually looking at doing this "well" - by > doing it in many smaller steps. I tried. I gave up. The "unsigned char > *sha1" model is so ingrained that doing it incrementally just seemed > like a lot of pain. But it might be the right approach. That somebody is brian carlson, cc'd. > which is pretty nasty. The good news is that my patch passes all the > tests, and while it's big it's mostly very very mindless, and a lot of > it looks like cleanups, and the lines are generally shorter, eg > > - const unsigned char *mb = result->item->object.oid.hash; > - if (!hashcmp(mb, current_bad_oid->hash)) { > > turns into > > + const hash_t *mb = &result->item->object.oid; > + if (!hashcmp(mb, current_bad_oid)) { > > but I ended up also renaming a lot of common "sha1" as "hash", which > adds to the noise in that patch. I think the preimage there is worse than it ought to be because it's mid-transition. "struct object" has an object_id, but the rest of the function hasn't been converted yet. So ultimately it should be: const struct object_id *mb = &result->item->object.oid; if (!oidcmp(mb, current_bad_oid)) It looks like you stuck your "hash_t" inside "struct object_id", which I think is redundant. They're both trying to solve the same problem. > NOTE! It doesn't actually _fix_ the SHA1-centricity in any way, but it > makes it a bit more obvious where the bigger problems are. Not that > anybody would be surprised by what they are, but as part of writing > the patch it did kind of pinpoint most of them, and about 30 of those > new lines are added > > /* FIXME! Hardcoded hash sizes */ > /* FIXME! Lots of fixed-size hashes */ > /* FIXME! Fixed 20-byte hash usage */ Yeah, a lot of brian's patches have been focused around the fixing the related size assumptions. We've got GIT_SHA1_HEXSZ which doesn't solve the problem, but at least makes it easy to find. And a big improvement in the most recent series is a parse_oid() function that lets you parse object-ids left-to-right without knowing the size up front. So things like: if (len > 82 && !get_sha1_hex(buf, sha1_a) && get_sha1_hex(buf + 41, sha1_b)) becomes more like: if (parse_oid(p, oid_a, &p) && *p++ == ' ' && parse_oid(p, oid_b, &p) && *p++ == '\n') Still, if you've done more conversion, it's probably worth showing it publicly. It might not end up used, but it may serve as a reference later. > And as part of the type safety, I do think I may have found a bug: > > show_one_mergetag(): > > strbuf_addf(&verify_message, "tag %s names a non-parent %s\n", > tag->tag, tag->tagged->oid.hash); > > note how it prints out the "non-parent %s", but that's a SHA1 hash > that hasn't been converted to hex. Hmm? Yeah, that's definitely a bug. I'm surprised that -Wformat doesn't complain, but I guess we'd need -Wformat-signedness (which triggers quite a lot of warnings related to "int"). It's especially bad for "%x", which is implicitly unsigned. -Peff