Re: [PATCH 4/8] git-repack --max-pack-size: add fixup_header_footer()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 4/9/07, Shawn O. Pearce <spearce@xxxxxxxxxxx> wrote:
Nicolas Pitre <nico@xxxxxxx> wrote:
> I'd be really tempted to create a pack v4 which only change is to still
> have the pack header at the beginning of the pack like we do today, but
> include the header in the pack SHA1 computation at the end of the stream
> only.  This way the pack SHA1 could be computed as the pack is
> generated, and the header fixed up without having to read the entire
> pack back.  I think it was Geert Bosch who proposed this and it makes
> tons of sense IMHO.

Yes.  If we really are heading in this direction of needing to
correct object counts, we should make that change.  Its trivial
to hang onto that header for the duration of the rest of the data
processing, and tack it onto the end for final SHA-1 computation.

I like the property that when an SHA-1 appears at the end of a file,
it is a checksum of every byte before it.  The ideas above are a
departure from that.  Do we want this rule to be different for each file type?

Wouldn't the following address the "object count unknown
at the start of sequential pack writing" problem:
 Write 0 for object count in the header. This is a flag to look for
 another header of same format just before the final SHA-1 which
 has the correct count. The SHA-1 is still a checksum of everything
 before it and no seeking/rewriting is needed on generation.  When
 reading the object count from a .pack file, you might need to add
     xread(pack_fd, &header, sizeof(header));
+    if (!header.object_count) {
+      lseek(pack_fd, -20-sizeof(header), SEEK_END);
+      xread(pack_fd, &header, sizeof(header);
+    }
 Or maybe you want this before the object_list_sha1 instead (20->40).

Finally, when I generate several 2GB split packfiles,  I do notice
the slight delay for fixup_header_footer(), and I do think it's a bit
ugly, but in quantitative terms it's an insignificant part of a long
operation that's infrequently performed.  Does this need to be
optimized at all?

Thanks,
--
Dana L. How  danahow@xxxxxxxxx  +1 650 804 5991 cell
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]