On Mon, Jun 21, 2021 at 05:03:38PM +0200, Ævar Arnfjörð Bjarmason wrote: > Fix a segfault in the --stdin-packs option added in > 339bce27f4f (builtin/pack-objects.c: add '--stdin-packs' option, > 2021-02-22). The read_packs_list_from_stdin() function didn't check > that the lines it was reading were valid packs, and thus when doing > the QSORT() with pack_mtime_cmp() we'd have a NULL "util" field. It may be worth mentioning that the util pointer is used to associate the names of included/excluded packs with the packed_git structs they correspond to. I see it's mentioned in the very next paragraph, but it may be helpful for other readers to see this information earlier. > The logic error was in assuming that we could iterate all packs and > annotate the excluded and included packs we got, as opposed to > checking the lines we got on stdin. There was a check for excluded > packs, but included packs were simply assumed to be valid. > > As noted in the test we'll not report the first bad line, but whatever > line sorted first according to the string-list.c API. In this case I > think that's fine. Yeah. There isn't really a better way to do that since we don't have a convenient function to look up packs by their name. Much more convenient is to loop through all packs and assign them to entries in the string_list one by one. That's O(n*log(n)), but it doesn't really matter here since we expect n to be small-ish, and this is by far not the most expensive part of writing a pack. You could imagine doing something O(n^2) by looping through all packs each time you receive a line of input. That performs worse, but arguably provides a better experience when using this mode interactively. But that is probably a relatively rare occurrence, so it likely doesn't matter. Equally, you could build a mapping from pack name to packed_git struct ahead of time, and then do the lookups in constant time. That's linear, of course, but you pay for it in memory. Honestly, the memory cost is probably quite reasonable, but it may not be worth the effort, since I suspect the vast majority of usage here is from 'git repack --geometric'. > Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> > --- > builtin/pack-objects.c | 10 ++++++++++ > t/t5300-pack-object.sh | 18 ++++++++++++++++++ > 2 files changed, 28 insertions(+) > > diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c > index de00adbb9e0..65579e09fe0 100644 > --- a/builtin/pack-objects.c > +++ b/builtin/pack-objects.c > @@ -3310,6 +3310,16 @@ static void read_packs_list_from_stdin(void) > item->util = p; > } > > + /* > + * Arguments we got on stdin may not even be packs. Check that > + * to avoid segfaulting later on in e.g. pack_mtime_cmp(). > + */ Could be worth adding "excluded packs are handled below". > + for_each_string_list_item(item, &include_packs) { > + struct packed_git *p = item->util; > + if (!p) > + die(_("could not find pack '%s'"), item->string); > + } > + > /* > * First handle all of the excluded packs, marking them as kept in-core ...and it may be worth updating this comment with s/First/Then. > diff --git a/t/t5300-pack-object.sh b/t/t5300-pack-object.sh > index 65e991e3706..330deec656b 100755 > --- a/t/t5300-pack-object.sh > +++ b/t/t5300-pack-object.sh > @@ -119,6 +119,24 @@ test_expect_success 'pack-object <stdin parsing: [|--revs] with --stdin' ' > test_cmp err.expect err.actual > ' > > +test_expect_success 'pack-object <stdin parsing: --stdin-packs handles garbage' ' > + cat >in <<-EOF && > + $(git -C pack-object-stdin rev-parse one) > + $(git -C pack-object-stdin rev-parse two) > + EOF It's not a big deal, but here-doc directly into `git pack-objects` is much more common in t5300 than first redirecting it to a separate file. I probably would have written (in a sub-shell to avoid -C pack-object-stdin everywhere): cd pack-object-stdin && test_must_fail git pack-objects --stdout --stdin-packs >/dev/null 2>actual <<-EOF $(git rev-parse one) $(git rev-parse two) EOF Although the line is kind of long anyway (and it'd be even longer since the subshell will get its own level of indentation). So I could entirely buy that you did this for readability, which is fine by me. > + > + # We actually just report the first bad line in strcmp() > + # order, it just so happens that we get the same result under > + # SHA-1 and SHA-256 here. It does not really matter that we > + # report the first bad item in this obscure case, so this > + # oddity of the test is OK. > + cat >err.expect <<-EOF && > + fatal: could not find pack '"'"'$(git -C pack-object-stdin rev-parse two)'"'"' > + EOF > + test_must_fail git -C pack-object-stdin pack-objects stdin-with-stdin-option --stdin-packs <in 2>err.actual && > + test_cmp err.expect err.actual If we don't care which is reported (and it just so happens that we'll get the first one in lexical order), I would be fine with test_i18ngrep "could not find pack" err.actual too. It would be good to get rid of this comment and put it in the patch message in more detail (instead of just referring to it as "[a]s noted in the test". Thanks, Taylor