Re: [PATCH 0/3] list-object-filter: introduce depth filter

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Johannes Schindelin <Johannes.Schindelin@xxxxxx> 于2022年9月7日周三 18:18写道:
>
> Hi ZheNing,
>
> On Sun, 4 Sep 2022, ZheNing Hu wrote:
>
> > Johannes Schindelin <Johannes.Schindelin@xxxxxx> 于2022年9月2日周五 21:48写道:
> >
> > > [...]
> > > When you have all the commit and tree objects on the local side,
> > > you can enumerate all the blob objects you need in one fell swoop, then
> > > fetch them in a single network round trip.
> > >
> > > When you lack tree objects, or worse, commit objects, this is not true.
> > > You may very well need to fetch _quite_ a bunch of objects, then inspect
> > > them to find out that you need to fetch more tree/commit objects, and then
> > > a couple more round trips, before you can enumerate all of the objects you
> > > need.
> >
> > I think this is because the previous design was that you had to fetch
> > these missing commits (also trees) and all their ancestors. Maybe we can
> > modify git rev-list to make it understand missing commits...
>
> We do have such a modification, and it is called "shallow clone" ;-)
>
> Granted, shallow clones are not a complete solution and turned out to be a
> dead end (i.e. that design cannot be extended into anything more useful).

Yeah, the depth filter would have been possible to overcome this
shortcoming, but
it may require very much network overhead in some special cases.

> But that approach demonstrates what it would take to implement a logic
> whereby Git understands that some commit ranges are missing and should not
> be fetched automatically.
>

Agree. Git uses the commit-graft to do so.

> > > [...] it is hard to think of a way how the design could result in
> > > anything but undesirable behavior, both on the client and the server
> > > side.
> > >
> > > We also have to consider that our experience with large repositories
> > > demonstrates that tree and commit objects delta pretty well and are
> > > virtually never a concern when cloning. It is always the sheer amount
> > > of blob objects that is causing poor user experience when performing
> > > non-partial clones of large repositories.
> >
> > Thanks, I think I understand the problem here. By the way, does it make
> > sense to download just some of the commits/trees in some big repository
> > which have several million commits/trees?
>
> It probably only makes sense if we can come up with a good idea how to
> teach Git the trick to stop downloading so many objects in costly
> roundtrips.
>

Good advice. Perhaps we should merge these multiple requests into one.
Maybe we should use a blob:none filter to download all missing trees/commits
if we need to iterate through all commits history.

> But I wonder whether your scenarios are so different from the ones I
> encountered, in that commit and tree objects do _not_ delta well on your
> side?
>
> If they _do_ delta well, i.e. if it is comparatively cheap to just fetch
> them all in one go, it probably makes more sense to just drop the idea of
> fetching only some commit/tree objects but not others in a partial clone,
> and always fetch all of 'em.
>

Delta is a wonderful thing most of the time (in cases where bulk acquisition
is required). But sometimes I think users just want to see the message of one
commit, so why do they have to download other commits/trees that are not
required?

Sometimes users may better understand the working patterns of their git
objects than the git server, It may be nice if the user could download the
specified object just mapped by its objectid (it is only for blob now, right?)

> > > Now, I can be totally wrong in my expectation that there is _no_ scenario
> > > where cloning with a "partial depth" would cause anything but poor
> > > performance. If I am wrong, then there is value in having this feature,
> > > but since it causes undesirable performance in all cases I can think of,
> > > it definitely should be guarded behind an opt-in flag.
> >
> > Well, now I think this depth filter might be a better fit for git fetch.
>
> I disagree here, because I see all the same challenges as I described for
> clones missing entire commit ranges.
>

Oh, a prerequisite is missing here: after we have all commits, trees,
then use the
depth filter to down missing blobs.

> > If git checkout or other commands which just need to check
> > few commits, and find almost all objects (maybe >= 75%) in a
> > commit are not local, it can use this depth filter to download them.
>
> If you want a clone that does not show any reasonable commit history
> because it does not fetch commit objects on-the-fly, then we already have
> such a thing with shallow clones.
>
> The only way to make Git's revision walking logic perform _somewhat_
> reasonably would be to teach it to fetch not just a single commit object
> when it was asked for, but to somehow pass a desired depth by which to
> "unshallow" automatically.
>
> However, such a feature would come with the same undesirable implications
> on the server side as shallow clones (fetches into shallow clones are
> _really_ expensive on the server side).
>

Agree. letting git shallow clone to be smarter may work, but there are
big challenges too.

> Ciao,
> Dscho

Thanks,
ZheNing Hu




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux