On 5/22/07, Junio C Hamano <junkio@xxxxxxx> wrote:
Dana How <danahow@xxxxxxxxx> writes: ---------------------------------------------------------------- > Subject: [PATCH] Alter sha1close() 3rd argument to request flush only > > update=0 suppressed writing the final SHA-1 but was not used. > Now final=0 suppresses SHA-1 finalization, SHA-1 writing, > and closing -- in other words, only flush the buffer. What it does is understandable but it somehow feels funny that "sha1close(file, hashresult, 0)" does _not_ close it (and does not hash either for obvious reasons ;-). I would say we should let it pass this round, but might want to separate the first part out into a separate "update hash and flush" function if we get more callers.
OK, see the list at the end.
---------------------------------------------------------------- > Subject: [PATCH 1/4] git-repack --max-pack-size: new file statics and code restructuring > > Add "pack_size_limit", the limit specified by --max-pack-size, > "written_list", the list of objects written to the current pack, > and "nr_written", the number of objects in written_list. > Put "base_name" at file scope again and add forward declarations. > Move write_index_file() call from cnd_pack_objects() to > write_pack_file() since only the latter will know how > many times to call write_index_file(). I would have split this part a bit differently. This is mostly about restructuring the code so that write_index_file() is called from write_pack_file(), which by itself is a very good change (but then we might have been better off passing basename as a parameter).
That would have worked too. I made base_name file scope b/c that's how it was before NP's immediatley previous patches.
You are not using written_list nor limit yet but are introducing them in this step, which feels not quite right. I usually compile stuff with -Werror, and if I ever have to bisect the series, this would bomb out for these unused variables. Not nice.
Very good point -- I did not think of that. I will make sure in the future each patch in a patchset has no warnings.
---------------------------------------------------------------- > Subject: [PATCH 2/4] git-repack --max-pack-size: write_{object,one}() respect pack limit > > With --max-pack-size, generate the appropriate write limit > for each object and check against it before each group of writes. > Update delta usability rules to handle base being in a previously- > written pack. Inline sha1write_compress() so we know the > exact size of the written data when it needs to be compressed. > Detect and return write "failure". Again, this is split somewhat wrongly, as you do not have a way to set max size from the caller, but more importantly, even though write_one and write_object knows to obey the limit (perhaps somebody bisecting this series later may set the limit under the debugger, to work around the lack of option parser), write_pack_file() does not notice zero return from write_one(); I would have added a check there to do: die("sorry, limit reached and we do not have code to split the pack yet.") which obviously can be updated in the next patch.
This makes sense to me. In a previous version of the patchset, I had some temporary code like you say (just some extra arguments, not any checking or die() calls) which was immediately changed/replaced in the next patch. Shawn or Nicolas didn't like that, so I migrated to the split I used here. I didn't really intend bisecting this patchset to do anything useful *with respect to max-pack-size functionality*, but bisecting it would be useful to detect if these patches had broken something else. But I can rethink that in the future.
---------------------------------------------------------------- > Subject: [PATCH 3/4] git-repack --max-pack-size: split packs as asked by write_{object,one}() > > Rewrite write_pack_file() to break to a new packfile > whenever write_object/write_one request it, and > correct the header's object count in the previous packfile. > Change write_index_file() to write an index > for just the objects in the most recent packfile. > > diff --git a/builtin-pack-objects.c b/builtin-pack-objects.c > --- a/builtin-pack-objects.c > +++ b/builtin-pack-objects.c > @@ -652,7 +663,26 @@ static void write_pack_file(void) > + } while (nr_remaining && i < nr_objects); > + > + for (j = 0; i < nr_objects; i++) { > + struct object_entry *e = objects + i; > + j += !e->offset && !e->preferred_base; > + } > + if (j) > + die("wrote %u objects as expected but %u unwritten", written, j); I am a bit confused by this loop. Don't you have to start with i=0 for this check to be meaningful?
The previous do-while loop whose last line is shown could end with i < nr_objects due to nr_remaining becoming 0. This means the objects [i ... nr_objects) have not been inspected. They should all be either already written or non-writable, which is what the two terms in the && expression are testing. See list at end.
---------------------------------------------------------------- > Subject: [PATCH 4/4] git-repack --max-pack-size: add option parsing to enable feature > > Add --max-pack-size parsing and usage messages. > Upgrade git-repack.sh to handle multiple packfile names, > and build packfiles in GIT_OBJECT_DIRECTORY not GIT_DIR. > Update documentation. > > diff --git a/builtin-pack-objects.c b/builtin-pack-objects.c > --- a/builtin-pack-objects.c > +++ b/builtin-pack-objects.c > @@ -1713,6 +1713,13 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) > + if (!prefixcmp(arg, "--max-pack-size=")) { > + char *end; > + pack_size_limit = strtoul(arg+16, &end, 0) * 1024 * 1024; > + if (!arg[16] || *end) > + usage(pack_usage); > + continue; > + } Hmmm. I was almost going to suggest to have this spelled in bytes, with suffixes like k/m/g. However, because wanting to limit a pack under 1.4MB does not make much sense these days, and because having to spell "up to 2GB" as 2047 is not too much trouble, I think this is Ok.
I thought about suffixes too but I chose this way b/c that's how git fast-import does it.
Shouldn't we have a safety to error out when --stdout and --max-pack-size are both given? Currently it silently ignores the limit, doesn't it?
Yes & yes. Previously there was disagreement on this point. One preference was to disallow the combination, another was that it was useful and should be supported. So I left the code in a state where a small follow-on patch could make the decision. The current behavior is not to disallow the combination, and it is well-defined (you get a sequence of packs on stdout, concatenated, whose headers indicate how many objects there are in the current pack and all following). I do not think --stdout && --max-pack-size is currently useful, and a follow-on patch should complain. See list at end.
> diff --git a/git-repack.sh b/git-repack.sh > --- a/git-repack.sh > +++ b/git-repack.sh > @@ -35,7 +36,7 @@ true) > -PACKTMP="$GIT_DIR/.tmp-$$-pack" > +PACKTMP="$GIT_OBJECT_DIRECTORY/.tmp-$$-pack" Although this is a good change, this hunk does not belong to this.
OK. If you decide not to keep this change here for historical clarity, I can add it back later.
Overall everything looks good, except some minor details noted above. Separation of the commits into logical steps does not need to be fixed up (they are already in 'next'), but follow-up patches might be needed.
OK, I will not edit and re-submit these patches. But I will submit two follow-on patches: Patch 1: * Pull "sha1flush()" or similar out of sha1close() inside csum-file.c. This will require some edits to callers. Patch 2: * Add a comment to "confusing loop" to explain what it's checking. * Complain about --stdout && --max-pack-size combination. This will have to happen tomorrow.
And I have to agree with Linus; responding this way was more cumbersome than it should have been.
Understood. Thanks, -- Dana L. How danahow@xxxxxxxxx +1 650 804 5991 cell - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html