On Thu, May 27 2021, Jiang Xin wrote: > Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> 于2021年5月27日周四 > 上午2:51写道: >> >> >> On Mon, Jan 11 2021, Jiang Xin wrote: >> >> > From: Jiang Xin <zhiyou.jx@xxxxxxxxxxxxxxx> >> > >> > Move git-bundle related functions from t5510 to a library, and this >> > lib >> > will be shared with a new testcase t6020 which finds a known >> > breakage of >> > "git-bundle". >> > [...] >> > + >> > +# Format the output of git commands to make a user-friendly and >> > stable >> > +# text. We can easily prepare the expect text without having to >> > worry >> > +# about future changes of the commit ID and spaces of the output. >> > +make_user_friendly_and_stable_output () { >> > + sed \ >> > + -e "s/${A%${A#???????}}[0-9a-f]*/<COMMIT-A>/g" \ >> > + -e "s/${B%${B#???????}}[0-9a-f]*/<COMMIT-B>/g" \ >> > + -e "s/${C%${C#???????}}[0-9a-f]*/<COMMIT-C>/g" \ >> > + -e "s/${D%${D#???????}}[0-9a-f]*/<COMMIT-D>/g" \ >> > + -e "s/${E%${E#???????}}[0-9a-f]*/<COMMIT-E>/g" \ >> > + -e "s/${F%${F#???????}}[0-9a-f]*/<COMMIT-F>/g" \ >> > + -e "s/${G%${G#???????}}[0-9a-f]*/<COMMIT-G>/g" \ >> > + -e "s/${H%${H#???????}}[0-9a-f]*/<COMMIT-H>/g" \ >> > + -e "s/${I%${I#???????}}[0-9a-f]*/<COMMIT-I>/g" \ >> > + -e "s/${J%${J#???????}}[0-9a-f]*/<COMMIT-J>/g" \ >> > + -e "s/${K%${K#???????}}[0-9a-f]*/<COMMIT-K>/g" \ >> > + -e "s/${L%${L#???????}}[0-9a-f]*/<COMMIT-L>/g" \ >> > + -e "s/${M%${M#???????}}[0-9a-f]*/<COMMIT-M>/g" \ >> > + -e "s/${N%${N#???????}}[0-9a-f]*/<COMMIT-N>/g" \ >> > + -e "s/${O%${O#???????}}[0-9a-f]*/<COMMIT-O>/g" \ >> > + -e "s/${P%${P#???????}}[0-9a-f]*/<COMMIT-P>/g" \ >> > + -e "s/${TAG1%${TAG1#???????}}[0-9a-f]*/<TAG-1>/g" \ >> > + -e "s/${TAG2%${TAG2#???????}}[0-9a-f]*/<TAG-2>/g" \ >> > + -e "s/${TAG3%${TAG3#???????}}[0-9a-f]*/<TAG-3>/g" \ >> > + -e "s/ *\$//" >> > +} >> >> On one of the gcc farm boxes, a i386 box (gcc45) this fails because >> sed >> gets killed after >500MB of memory use (I was just eyeballing it in >> htop) on the "reate bundle from special rev: main^!" test. This with >> GNU >> sed 4.2.2. >> >> I suspect this regex pattern creates some runaway behavior in sed >> that's >> since been fixed (or maybe it's the glibc regex engine?). The glibc is >> 2.19-18+deb8u10: >> >> + git bundle list-heads special-rev.bdl >> + make_user_friendly_and_stable_output >> + sed -e s/[0-9a-f]*/<COMMIT-A>/g -e s/[0-9a-f]*/<COMMIT-B>/g -e >> s/[0-9a-f]*/<COMMIT-C>/g -e s/[0-9a-f]*/<COMMIT-D>/g -e >> s/[0-9a-f]*/<COMMIT-E>/g -e s/[0-9a-f]*/<COMMIT-F>/g -e >> s/[0-9a-f]*/<COMMIT-G>/g -e s/[0-9a-f]*/<COMMIT-H>/g -e >> s/[0-9a-f]*/<COMMIT-I>/g -e s/[0-9a-f]*/<COMMIT-J>/g -e >> s/[0-9a-f]*/<COMMIT-K>/g -e s/[0-9a-f]*/<COMMIT-L>/g -e >> s/[0-9a-f]*/<COMMIT-M>/g -e s/[0-9a-f]*/<COMMIT-N>/g -e >> s/[0-9a-f]*/<COMMIT-O>/g -e s/[0-9a-f]*/<COMMIT-P>/g -e >> s/[0-9a-f]*/<TAG-1>/g -e s/[0-9a-f]*/<TAG-2>/g -e >> s/[0-9a-f]*/<TAG-3>/g -e s/ *$// >> sed: couldn't re-allocate memory > > I wrote a program on macOS to check memory footprint for sed and perl. > See: > > https://github.com/jiangxin/compare-sed-perl Interesting use of Go for as a /usr/bin/time -v replacement :) After changing your int64 to int32 and digging up how to cross-compile Go I get similar results, it's because your test has actual short SHA-1s in the "-e 's///g'"'s, but notice how in the trace I have it's e.g. "s/[0-9a-f]*/<COMMIT-A>/g". That's the problem, so that Go command won't reproduce it. Anyway, changing the test to emit to "input" first and running this shows it: avar@gcc45:/run/user/1632/git/t/trash directory.t6020-bundle-misc$ /usr/bin/time -v sed -e 's/[0-9a-f]*/<COMMIT-A>/g' -e 's/[0-9a-f]*/<COMMIT-B>/g' -e 's/[0-9a-f]*/<COMMIT-C>/g' -e 's/[0-9a-f]*/<COMMIT-D>/g' -e 's/[0-9a-f]*/<COMMIT-E>/g' -e 's/[0-9a-f]*/<COMMIT-F>/g' -e 's/[0-9a-f]*/<COMMIT-G>/g' -e 's/[0-9a-f]*/<COMMIT-H>/g' -e 's/[0-9a-f]*/<COMMIT-I>/g' -e 's/[0-9a-f]*/<COMMIT-J>/g' -e 's/[0-9a-f]*/<COMMIT-K>/g' -e 's/[0-9a-f]*/<COMMIT-L>/g' -e 's/[0-9a-f]*/<COMMIT-M>/g' -e 's/[0-9a-f]*/<COMMIT-N>/g' -e 's/[0-9a-f]*/<COMMIT-O>/g' -e 's/[0-9a-f]*/<COMMIT-P>/g' -e 's/[0-9a-f]*/<TAG-1>/g' -e 's/[0-9a-f]*/<TAG-2>/g' -e 's/[0-9a-f]*/<TAG-3>/g' -e 's/ *$//' <input sed: couldn't re-allocate memory Command exited with non-zero status 4 Command being timed: "sed -e s/[0-9a-f]*/<COMMIT-A>/g -e s/[0-9a-f]*/<COMMIT-B>/g -e s/[0-9a-f]*/<COMMIT-C>/g -e s/[0-9a-f]*/<COMMIT-D>/g -e s/[0-9a-f]*/<COMMIT-E>/g -e s/[0-9a-f]*/<COMMIT-F>/g -e s/[0-9a-f]*/<COMMIT-G>/g -e s/[0-9a-f]*/<COMMIT-H>/g -e s/[0-9a-f]*/<COMMIT-I>/g -e s/[0-9a-f]*/<COMMIT-J>/g -e s/[0-9a-f]*/<COMMIT-K>/g -e s/[0-9a-f]*/<COMMIT-L>/g -e s/[0-9a-f]*/<COMMIT-M>/g -e s/[0-9a-f]*/<COMMIT-N>/g -e s/[0-9a-f]*/<COMMIT-O>/g -e s/[0-9a-f]*/<COMMIT-P>/g -e s/[0-9a-f]*/<TAG-1>/g -e s/[0-9a-f]*/<TAG-2>/g -e s/[0-9a-f]*/<TAG-3>/g -e s/ *$//" User time (seconds): 130.00 System time (seconds): 2.42 Percent of CPU this job got: 100% Elapsed (wall clock) time (h:mm:ss or m:ss): 2:12.41 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 1030968 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 0 Minor (reclaiming a frame) page faults: 257333 Voluntary context switches: 1 Involuntary context switches: 12578 Swaps: 0 File system inputs: 0 File system outputs: 0 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 4 But no, the issue as it turns out is not Perl v.s. Sed, it's that there's some bug in the shellscript / tooling version (happens with both dash 0.5.7-4 and bash 4.3-11+deb8u2 on that box) where those expansions like ${A%${A#??????0?}} resolve to nothing. So if we make that: cat >input && cat input >&2 && sed -e "s/${A%${A#??????0?}}[0-9a-f]*/<COMMIT-A>/g" <input >input.tmp && mv input.tmp input && cat input >&2 && sed -e "s/${B%${B#???????}}[0-9a-f]*/<COMMIT-B>/g" <input >input.tmp && mv input.tmp input && cat input >&2 && We get things like: + sed -e s/[0-9a-f]*/<COMMIT-A>/g + mv input.tmp input + cat input <COMMIT-A> <COMMIT-A>r<COMMIT-A>s<COMMIT-A>/<COMMIT-A>h<COMMIT-A>s<COMMIT-A>/<COMMIT-A>m<COMMIT-A>i<COMMIT-A>n<COMMIT-A> + sed -e s/[0-9a-f]*/<COMMIT-B>/g + mv input.tmp input + cat input <COMMIT-B><<COMMIT-B>C<COMMIT-B>O<COMMIT-B>M<COMMIT-B>M<COMMIT-B>I<COMMIT-B>T<COMMIT-B>-<COMMIT-B>A<COMMIT-B>><COMMIT-B> <COMMIT-B><<COMMIT-B>C<COMMIT-B>O<COMMIT-B>M<COMMIT-B>M<COMMIT-B>I<COMMIT-B>T<COMMIT-B>-<COMMIT-B>A<COMMIT-B>><COMMIT-B>r<COMMIT-B><<COMMIT-B>C<COMMIT-B>O<COMMIT-B>M<COMMIT-B>M<COMMIT-B>I<COMMIT-B>T<COMMIT-B>-<COMMIT-B>A<COMMIT-B>><COMMIT-B>s<COMMIT-B><<COMMIT-B>C<COMMIT-B>O<COMMIT-B>M<COMMIT-B>M<COMMIT-B>I<COMMIT-B>T<COMMIT-B>-<COMMIT-B>A<COMMIT-B>><COMMIT-B>/<COMMIT-B><<COMMIT-B>C<COMMIT-B>O<COMMIT-B>M<COMMIT-B>M<COMMIT-B>I<COMMIT-B>T<COMMIT-B>-<COMMIT-B>A<COMMIT-B>><COMMIT-B>h<COMMIT-B><<COMMIT-B>C<COMMIT-B>O<COMMIT-B>M<COMMIT-B>M<COMMIT-B>I<COMMIT-B>T<COMMIT-B>-<COMMIT-B>A<COMMIT-B>><COMMIT-B>s<COMMIT-B><<COMMIT-B>C<COMMIT-B>O<COMMIT-B>M<COMMIT-B>M<COMMIT-B>I<COMMIT-B>T<COMMIT-B>-<COMMIT-B>A<COMMIT-B>><COMMIT-B>/<COMMIT-B><<COMMIT-B>C<COMMIT-B>O<COMMIT-B>M<COMMIT-B>M<COMMIT-B>I<COMMIT-B>T<COMMIT-B>-<COMMIT-B>A<COMMIT-B>><COMMIT-B>m<COMMIT-B><<COMMIT-B>C<COMMIT-B>O<COMMIT-B>M<COMMIT-B>M<COMMIT-B>I<COMMIT-B>T<COMMIT-B>-<COMMIT-B>A<COMMIT-B>><COMMIT-B>i<COMMIT-B><<COMMIT-B>C<COMMIT-B>O<COMMIT-B>M<COMMIT-B>M<COMMIT-B>I<COMMIT-B>T<COMMIT-B>-<COMMIT-B>A<COMMIT-B>><COMMIT-B>n<COMMIT-B><<COMMIT-B>C<COMMIT-B>O<COMMIT-B>M<COMMIT-B>M<COMMIT-B>I<COMMIT-B>T<COMMIT-B>-<COMMIT-B>A<COMMIT-B>><COMMIT-B> [...] etc. I.e. it's the sed expression itself that's the issue. I.e. you should be able to reproduce this locally with something like: echo 0 | sed -e 's/[0-9]*/<BEGIN>0<END>/g' -e 's/[0-9]*/<BEGIN>0<END>/g' -e 's/[0-9]*/<BEGIN>0<END>/g' -e 's/[0-9]*/<BEGIN>0<END>/g' -e 's/[0-9]*/<BEGIN>0<END>/g' -e 's/[0-9]*/<BEGIN>0<END>/g' -e 's/[0-9]*/<BEGIN>0<END>/g' -e 's/[0-9]*/<BEGIN>0<END>/g' If not just copy the -e a few more times. Anyway, looking at this whole test file with fresh eyes this pattern seems very strange. You duplicated most of test_commit with this test_commit_setvar. It's a bit more verbosity but why not just use: test_commit ... A=$(git rev-parse HEAD) Or teach test_commit a --rev-parse option or something and: A=$(test_commit ...) This make_user_friendly_and_stable_output then actually loses information, e.g. sometimes the bundle output you're testing emits trailing spaces, but the normalization function overzelously trims that. I think this whole thing would be much simpler with the above and then something like: @@ -146,7 +126,8 @@ test_expect_success 'setup' ' # branch main: merge commit I & J git checkout main && - test_commit_setvar --merge I topic/1 "Merge commit I" && + git merge --no-edit --no-ff -m"Merge commit I" topic/1 && + I=$(git rev-parse HEAD) && test_commit_setvar --merge J refs/pull/2/head "Merge commit J" && # branch main: commit K @@ -172,18 +153,18 @@ test_expect_success 'create bundle from special rev: main^!' ' git bundle list-heads special-rev.bdl | make_user_friendly_and_stable_output >actual && - cat >expect <<-\EOF && - <COMMIT-P> refs/heads/main + cat >expect <<-EOF && + $P refs/heads/main EOF test_cmp expect actual && Or just add a --merge option to test_commit itself.