Re: Bug report: git -L requires excessive memory.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, Oct 30, 2022 at 01:59:41AM +0900, man dog wrote:
> Thank you for filling out a Git bug report!
> Please answer the following questions to help us understand your issue.
> 
> What did you do before the bug happened? (Steps to reproduce your issue)
> git log -L /regex/,/regex/:myfile to a repo in which 2MB text file is
> committed about 2800 times.
> 
> What did you expect to happen? (Expected behavior)
> get the result.
> 
> What happened instead? (Actual behavior)
> fatal: Out of memory, malloc failed (tried to allocate 2346801 bytes)

Thanks for the report and the reproduction recipe.

This is not a buggy allocation (the size matches the size of the test
file + 1 byte), but the line-level log apparently leaks some memory
for each commit modifying the file in question, and in your case their
combined size is excessive because of the somewhat big file that is
modified in every commit.

'line-log.c' contains two "NEEDSWORK leaking like a sieve" comments,
but you managed to stumble upon yet another case (those two are in the
code path handling merge commits, but your history is linear).

The patch below plugs this leak.

  ---  >8  ---

diff --git a/line-log.c b/line-log.c
index 51d93310a4..b6ea82ac6b 100644
--- a/line-log.c
+++ b/line-log.c
@@ -1195,6 +1195,9 @@ static int process_ranges_ordinary_commit(struct rev_info *rev, struct commit *c
 	if (parent)
 		add_line_range(rev, parent, parent_range);
 	free_line_log_data(parent_range);
+	for (int i = 0; i < queue.nr; i++)
+		diff_free_filepair(queue.queue[i]);
+	free(queue.queue);
 	return changed;
 }
 
  ---  8<  ---

> What's different between what you expected and what actually happened?
> The function requires too much memory.
> -n option should work for -L function.

Line-level log does work with '-n', but the file is so big and is
modified so many times between commits that do touch the specified
line range, that by the time it gets to the 10th commit to show it has
already leaked over 4GB memory.  Had you specified an even smaller
number of commits to show it might have worked:

  $ for i in {1..7} ; do /usr/bin/time -f "n: $i  maxRSS: %M" git log -L /func_007\(/,/}$/:test.txt -n $i >/dev/null || break ; done
  n: 1  maxRSS: 531192
  n: 2  maxRSS: 989504
  n: 3  maxRSS: 1447900
  n: 4  maxRSS: 1906408
  n: 5  maxRSS: 2364740
  n: 6  maxRSS: 2823148
  n: 7  maxRSS: 3282360

In you reproduction recipe the given line range is modified every 100
commit and there are 3000 commits in total, so I estimate the total
memory usage to be somewhere around 13.5GB.  With the patch above it
tops out at around 260MB.


> Anything else you want to add:
> I made a script to reproduce this. Please run the script below.
> Results in each environments are in its header.
> A workaround which is given in other BBS is included also.
> 
> 
> 
> 
> #!/bin/bash
> #
> # Bug report: git -L requires excessive memory.
> # Run this script to reproduce
> #
> # MINGW32(git version 2.38.1.windows.1) fatal: Out of memory, malloc
> failed (tried to allocate 2346801 bytes)
> # MINGW64(git version 2.38.1.windows.1) requires  8.6GB
> # Linux64(git version 2.20.1          ) requires 13.1GB
> #
> 
> git --version
> 
> if [ ! -d .git ]; then
>   git init
>   c=${1:-3000}
>   for (( i=0;i<c;i++)); do
>     gawk -v r="$i" '
>       BEGIN{
>         for (i=0;i<100;i++) {
>           if (r>=i) {
>             printf("function func_%03d(){ // revised at %d\n",i,
> int((r-i)/100)*100+i%100)
>             printf("  // contents of function\n")
>             printf("}\n")
>             make_subfuncs(i);
>           }
>         }
>         exit
>       }
>       function make_subfuncs(i,    j){
>         for (j=0;j<300;j++) {
>           printf("function func_%03d_sub%03d(){\n",i,j)
>           printf("  // contents of sub functions are NOT revised.\n")
>           printf("}\n")
>         }
>       }' > test.txt
>     git add test.txt
>     git commit -m "revision $i"
>   done
>   git gc
> fi
> 
> git log -L /func_007\(/,/}$/:test.txt # this command requires excessive memory.
> git log -L /func_007\(/,/}$/:test.txt -n 10 # -n option doesn't work also.
> #git log -L /func_007\(/,/}$/:test.txt HEAD~10..HEAD~0 # this works.

Perhaps I misunderstood, but I got the impression that you think that
'HEAD~10..HEAD~0' and '-n 10' do the same.

They are not: 'HEAD~10..HEAD~0' means to process only the last ten
commits, so it can't leak all that much, and that's why it worked.
'-n 10', however, means to _show_ only ten commits, but process as
many commits as necessary to find those ten.  In your case, with the
line range being modified every 100 commit, that amounts to processing
over 1000 commits.

> #
> # This can be a workaround
> #
> step=50
> num=`git log | grep -c commit`
> for ((i=0;i<$num;i+=$step)); do
>   end=$((i+$step))
>   range=HEAD~$end..HEAD~$i
>   if [ $end -ge $num ]; then
>     range=HEAD~$i
>   fi
> #  echo $range
>   git --no-pager log -L /func_007\(/,/}$/:test.txt $range
> done
> 
> 
> 
> 
> [System Info]
> [Enabled Hooks]



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux