Junio C Hamano <junkio@xxxxxxx> wrote: > in the kernel repository, with three different implementations > of the "check-blob". > > - Checking with has_sha1_file() has negligible (unmeasurable) > performance penalty, which is what this patch has. > > - Checking with sha1_object_info() makes it somewhat slower, > perhaps by 5%. > > - Checking with read_sha1_file() to cause a fully re-validation > is prohibitively expensive (about 4 times as much runtime). > > I am tempted to make the version with has_sha1_file() the > default, getting rid of this new --check-blob option. Misses in has_sha1_file suck, as we rescan for packfiles after failing to find it in the current known packfiles and running a failed stat(). Per object. Though if --objects aborts with an error on the first failure, that's perfectly reasonable performance, and actually quite sane behavior. So I'm all for having --objects always imply has_sha1_file(). If the object exists enough to satisfy has_sha1_file() I think its sane enough to assume its safe for a fast-fetch path. If we didn't have fast-fetch implemented and we fetched <100 objects we'd use unpack-objects, which would only call has_sha1_file() anyway, and find that possibly corrupted object in the source repository..., oh, and that cannot happen as the source repository had to have a good copy of the damn object to send it to us in the first place. So has_sha1_file() is sufficient. In my opinion, I think the --check-blob option of rev-list should imply doing the sha1_object_info() type test at least, if not doing a full-up SHA-1 validation of the blob content. Which yea, that's basically most of an fsck... So I'm in favor of making --objects imply has_sha1_file(), aborting on the first false return from that, and either make --check-blob do the stricter checking work, or don't define it at this time. -- Shawn. - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html