Re: Unpredictable peak memory usage when using `git log` command

Yuri Karnilaev <karnilaev@xxxxxxxxx> · Sat, 31 Aug 2024 13:24:35 +0300

Thanks, Peff!

I will try the recommendations for optimizing memory consumption for my task, that you mentioned.

Have a nice day,
Yuri

> On 31. Aug 2024, at 0.06, Jeff King <peff@xxxxxxxx> wrote:
> 
> On Fri, Aug 30, 2024 at 03:20:15PM +0300, Yuri Karnilaev wrote:
> 
>> 2. Processing commits in batches:
>> ```
>> /usr/bin/time -l -h -p git log --ignore-missing --pretty=format:%H%x02%P%x02%aN%x02%aE%x02%at%x00 -n 1000 --skip=1000000 --numstat > 1.txt
>> ```
>> [...]
>> Operating System: Mac OS 14.6.1 (23G93)
>> Git Version: 2.39.3 (Apple Git-146)
> 
> I sent a patch which I think should make things better for you, but I
> wanted to mention two things in a more general way:
> 
>  1. You should really consider building a commit-graph file with "git
>     commit-graph write --reachable". That will reduce the memory usage
>     for this case, but also improve the CPU quite a bit (we won't have
>     to open those million skipped commits to chase their parent
>     pointers).
> 
>     I haven't kept up with the defaults for writing graph files. I
>     thought gc.writeCommitGraph defaults to "true" these days, though
>     that wouldn't help in a freshly cloned repository (arguably we
>     should write the commit graph on clone?).
> 
>  2. Using "--skip" still has to traverse all of those intermediate
>     commits. So it's effectively quadratic in the number of commits
>     overall (you end up skipping the first 1000 over and over).
> 
>     It's been a while since I've had to "paginate" segments of history
>     like this, but a better solution is along the lines of:
> 
>       - use "-n 1000" to get 1000 commits in each chunk
> 
>       - use "--boundary" to report the commits that were queued to be
> 	 traversed next but weren't shown
> 
>       - in invocations after the first one, start the traversal at
> 	 those boundary commits, rather than HEAD
> 
>     You'll probably need to add "%m" to your format to show the
>     boundaries (or alternatively, you can do the commit selection with
>     rev-list, and then output the result to "log --no-walk --stdin" to
>     do the pretty-printing).
> 
> -Peff