Hi Junio, On Fri, Nov 29, 2019 at 01:39:30PM -0800, Junio C Hamano wrote: > Denton Liu <liu.denton@xxxxxxxxx> writes: > > > The code to test that objects were not missing from the packfile was > > duplicated many times. Extract the duplicated code into > > test_no_missing_in_packs() and use that instead. > > > > Refactor the resulting extraction so that if any git commands fail, > > their return codes are not silently lost. > > > > We were using sed to filter lines. Although not incorrect, this is > > exactly what grep is built for. Replace this invocation of sed with grep > > so that we use the correct tool for the job. > > Well, > > $ sed -n -e 's/required match/desired part of the line/p' > > is much much more approirate than > > $ grep -e "requred match" | > extract desired part of the line > > "grep" is better only if the original were > > $ sed -n -e '/required match/p' > > but everybody would write it with grep to begin with ;-) This was what I was intending. It was originally written like the above and it made sense to convert it to use grep. I guess "filter lines" in my commit message is a little bit vague. Could we change this to "filter matching lines" perhaps? > > So, I dunno about this part of the conversion. > > > Instead of verifying each file of `alt_objects/pack/*.idx` individually > > in a for-loop, batch them together into one verification step. > > Do you mean this one? > > git verify-pack -v alt_objects/pack/*.idx > > where we may pass 1 or more .idx file to the command? At first my > reading was interrupted by a "Huh?", but that does look good. We'd > need to be a bit careful to make sure that we have at least 1 .idx > file, as the shell will happily feed a file whose name is "*.idx", > which verify-pack would be unhappy about. > > > The original testing construct was O(n^2): it used a grep in a loop to > > test whether any objects were missing in the packfile. Rewrite this to > > sort the files then use `comm -23` so that finding missing lines from > > the original file is done more efficiently. > > OK. If we an show measurable speedups, it would be great, but the > loop structure does look O(n^2) and unnecessary costly. > > > +test_no_missing_in_packs () { > > + myidx=$(ls -1 .git/objects/pack/*.idx) && > > + test_path_is_file "$myidx" && > > If there are 2 or more .idx files, or if there is none, $myidx would > hopefully be a concatenation of these filenames or a string that > ends with asterisk-dot-idx and would fail path_is_file. Sounds OK. > > Ah, I do not have to review this part---these are repeated patterns > in the original. > > > + git verify-pack -v alt_objects/pack/*.idx >orig.raw && > > + grep "^[0-9a-f]\{40\}" orig.raw | cut -d" " -f1 | sort >orig && > > If output from 'grep' can be used as-is, it is worth doing, but if > you have to pipe it to cut, the original that used sed to filter and > edit the line would probably be a better way to write it. The original sed actually only filtered the line; no editing done. The cut invocation was a consequence of using comm. Previously, in the while loop, we would separate the line into `sha1` and `rest` components and only match using the `sha1`. Since we use comm now, we have to use cut to grab the sha1 and omit the rest of the line. We could rewrite it with sed like this: sed -n -e "/^[0-9a-f]\{40\}/s/^\($[0-9a-f]\{40\}\).*/\1/" orig.raw but I believe that breaking it into grep and cut makes the intent much more clear. What do you think? (By the way, if I were to reroll this series, should I keep sending out the entire patchset? It feels very noisy to send out 20-something emails every reroll when I'm just making a small one or two line change.) Thanks, Denton > > > + git verify-pack -v $myidx >dest.raw && > > This part does not quote $myidx" (inherited from the original); it > probably is OK, as any potentially problematic value in $myidx would > have been caught as an error much earlier in this test. >