Taylor Blau <me@xxxxxxxxxxxx> 于2023年4月9日周日 09:28写道: > > On Sat, Apr 08, 2023 at 02:27:53PM +0800, ZheNing Hu wrote: > > Okay, you're right. It's not "ungraceful" to have each task do its own thing. > > I should clarify that for a command like `git cat-file --batch-all-objects`, > > which traverses all objects, it would be better to have a filter. It might be > > more performant than using `git rev-list --filter | git cat-file --batch`? > > Perhaps slightly so, since there is naturally going to be some > duplicated effort spawning processes, loading any shared libraries, > initializing the repository and reading its configuration, etc. > > But I'd wager that these are all a negligible cost when compared to the > time we'll have to spend reading, inflating, and printing out all of the > objects in your repository. > "What you said makes sense. I implemented the --type-filter option for git cat-file and compared the performance of outputting all blobs in the git repository with and without using the type-filter. I found that the difference was not significant. time git cat-file --batch-all-objects --batch-check="%(objectname) %(objecttype)" | awk '{ if ($2 == "blob") print $1 }' | git cat-file --batch > /dev/null 17.10s user 0.27s system 102% cpu 16.987 total time git cat-file --batch-all-objects --batch --type-filter=blob >/dev/null 16.74s user 0.19s system 95% cpu 17.655 total At first, I thought the processes that provide all blob oids by using git rev-list or git cat-file --batch-all-objects --batch-check might waste cpu, io, memory resources because they need to read a large number of objects, and then they are read again by git cat-file --batch. However, it seems that this is not actually the bottleneck in performance. > Hopefully any task(s) where that cost *wouldn't* be negligible relative > to the rest of the job would be small enough that they could fit into a > single process. > > > I don't think so. While `git rev-list` traverses objects and performs > > filtering within a revision, `git cat-file --batch-all-objects` traverses > > all loose and packed objects. It might be difficult to perfectly > > extract the filtering from `git rev-list` and apply it to `git cat-file`. > > `rev-list`'s `--all` option does exactly the former: it looks at all > loose and packed objects instead of doing a traditional object walk. > > Thanks, > Taylor