Re: [Question] Can git cat-file have a type filtering option?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Taylor Blau <me@xxxxxxxxxxxx> 于2023年4月9日周日 09:28写道:
>
> On Sat, Apr 08, 2023 at 02:27:53PM +0800, ZheNing Hu wrote:
> > Okay, you're right. It's not "ungraceful" to have each task do its own thing.
> > I should clarify that for a command like `git cat-file --batch-all-objects`,
> > which traverses all objects, it would be better to have a filter. It might be
> > more performant than using `git rev-list --filter | git cat-file --batch`?
>
> Perhaps slightly so, since there is naturally going to be some
> duplicated effort spawning processes, loading any shared libraries,
> initializing the repository and reading its configuration, etc.
>
> But I'd wager that these are all a negligible cost when compared to the
> time we'll have to spend reading, inflating, and printing out all of the
> objects in your repository.
>

"What you said makes sense. I implemented the --type-filter option for
git cat-file and compared the performance of outputting all blobs in the
git repository with and without using the type-filter. I found that the
difference was not significant.

time git  cat-file --batch-all-objects --batch-check="%(objectname)
%(objecttype)" |
awk '{ if ($2 == "blob") print $1 }' | git cat-file --batch > /dev/null
17.10s user 0.27s system 102% cpu 16.987 total

time git cat-file --batch-all-objects --batch --type-filter=blob >/dev/null
16.74s user 0.19s system 95% cpu 17.655 total

At first, I thought the processes that provide all blob oids by using
git rev-list or git cat-file --batch-all-objects --batch-check might waste
cpu, io, memory resources because they need to read a large number
of objects, and then they are read again by git cat-file --batch.
However, it seems that this is not actually the bottleneck in performance.

> Hopefully any task(s) where that cost *wouldn't* be negligible relative
> to the rest of the job would be small enough that they could fit into a
> single process.
>
> > I don't think so. While `git rev-list` traverses objects and performs
> > filtering within a revision, `git cat-file --batch-all-objects` traverses
> > all loose and packed objects. It might be difficult to perfectly
> > extract the filtering from `git rev-list` and apply it to `git cat-file`.
>
> `rev-list`'s `--all` option does exactly the former: it looks at all
> loose and packed objects instead of doing a traditional object walk.
>
> Thanks,
> Taylor




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux