Thanks, Peff! I will try the recommendations for optimizing memory consumption for my task, that you mentioned. Have a nice day, Yuri > On 31. Aug 2024, at 0.06, Jeff King <peff@xxxxxxxx> wrote: > > On Fri, Aug 30, 2024 at 03:20:15PM +0300, Yuri Karnilaev wrote: > >> 2. Processing commits in batches: >> ``` >> /usr/bin/time -l -h -p git log --ignore-missing --pretty=format:%H%x02%P%x02%aN%x02%aE%x02%at%x00 -n 1000 --skip=1000000 --numstat > 1.txt >> ``` >> [...] >> Operating System: Mac OS 14.6.1 (23G93) >> Git Version: 2.39.3 (Apple Git-146) > > I sent a patch which I think should make things better for you, but I > wanted to mention two things in a more general way: > > 1. You should really consider building a commit-graph file with "git > commit-graph write --reachable". That will reduce the memory usage > for this case, but also improve the CPU quite a bit (we won't have > to open those million skipped commits to chase their parent > pointers). > > I haven't kept up with the defaults for writing graph files. I > thought gc.writeCommitGraph defaults to "true" these days, though > that wouldn't help in a freshly cloned repository (arguably we > should write the commit graph on clone?). > > 2. Using "--skip" still has to traverse all of those intermediate > commits. So it's effectively quadratic in the number of commits > overall (you end up skipping the first 1000 over and over). > > It's been a while since I've had to "paginate" segments of history > like this, but a better solution is along the lines of: > > - use "-n 1000" to get 1000 commits in each chunk > > - use "--boundary" to report the commits that were queued to be > traversed next but weren't shown > > - in invocations after the first one, start the traversal at > those boundary commits, rather than HEAD > > You'll probably need to add "%m" to your format to show the > boundaries (or alternatively, you can do the commit selection with > rev-list, and then output the result to "log --no-walk --stdin" to > do the pretty-printing). > > -Peff