Re: [PATCH v5 22/26] t7700: consolidate code into test_no_missing_in_packs()

Denton Liu <liu.denton@xxxxxxxxx> · Mon, 2 Dec 2019 12:50:37 -0800

Hi Junio,

On Fri, Nov 29, 2019 at 01:39:30PM -0800, Junio C Hamano wrote:
> Denton Liu <liu.denton@xxxxxxxxx> writes:
> 
> > The code to test that objects were not missing from the packfile was
> > duplicated many times. Extract the duplicated code into
> > test_no_missing_in_packs() and use that instead.
> >
> > Refactor the resulting extraction so that if any git commands fail,
> > their return codes are not silently lost.
> >
> > We were using sed to filter lines. Although not incorrect, this is
> > exactly what grep is built for. Replace this invocation of sed with grep
> > so that we use the correct tool for the job.
> 
> Well, 
> 
>     $ sed -n -e 's/required match/desired part of the line/p'
> 
> is much much more approirate than 
> 
>     $ grep -e "requred match" |
>       extract desired part of the line
> 
> "grep" is better only if the original were
> 
>     $ sed -n -e '/required match/p'
> 
> but everybody would write it with grep to begin with ;-)

This was what I was intending. It was originally written like the above
and it made sense to convert it to use grep. I guess "filter lines" in
my commit message is a little bit vague. Could we change this to "filter
matching lines" perhaps?

> 
> So, I dunno about this part of the conversion.
> 
> > Instead of verifying each file of `alt_objects/pack/*.idx` individually
> > in a for-loop, batch them together into one verification step.
> 
> Do you mean this one?
> 
> 	git verify-pack -v alt_objects/pack/*.idx
> 
> where we may pass 1 or more .idx file to the command?  At first my
> reading was interrupted by a "Huh?", but that does look good.  We'd
> need to be a bit careful to make sure that we have at least 1 .idx
> file, as the shell will happily feed a file whose name is "*.idx",
> which verify-pack would be unhappy about.
> 
> > The original testing construct was O(n^2): it used a grep in a loop to
> > test whether any objects were missing in the packfile. Rewrite this to
> > sort the files then use `comm -23` so that finding missing lines from
> > the original file is done more efficiently.
> 
> OK.  If we an show measurable speedups, it would be great, but the
> loop structure does look O(n^2) and unnecessary costly.
> 
> > +test_no_missing_in_packs () {
> > +	myidx=$(ls -1 .git/objects/pack/*.idx) &&
> > +	test_path_is_file "$myidx" &&
> 
> If there are 2 or more .idx files, or if there is none, $myidx would
> hopefully be a concatenation of these filenames or a string that
> ends with asterisk-dot-idx and would fail path_is_file.  Sounds OK.
> 
> Ah, I do not have to review this part---these are repeated patterns
> in the original.
> 
> > +	git verify-pack -v alt_objects/pack/*.idx >orig.raw &&
> > +	grep "^[0-9a-f]\{40\}" orig.raw | cut -d" " -f1 | sort >orig &&
> 
> If output from 'grep' can be used as-is, it is worth doing, but if
> you have to pipe it to cut, the original that used sed to filter and
> edit the line would probably be a better way to write it.

The original sed actually only filtered the line; no editing done. The
cut invocation was a consequence of using comm. Previously, in the while
loop, we would separate the line into `sha1` and `rest` components and
only match using the `sha1`. Since we use comm now, we have to use cut
to grab the sha1 and omit the rest of the line.

We could rewrite it with sed like this:

	sed -n -e "/^[0-9a-f]\{40\}/s/^\($[0-9a-f]\{40\}\).*/\1/" orig.raw

but I believe that breaking it into grep and cut makes the intent much
more clear. 

What do you think?

(By the way, if I were to reroll this series, should I keep sending out
the entire patchset? It feels very noisy to send out 20-something emails
every reroll when I'm just making a small one or two line change.)

Thanks,

Denton

> 
> > +	git verify-pack -v $myidx >dest.raw &&
> 
> This part does not quote $myidx" (inherited from the original); it
> probably is OK, as any potentially problematic value in $myidx would
> have been caught as an error much earlier in this test.
>