Re: [RFC PATCH] Updated "imported object" design

Jonathan Tan <jonathantanmy@xxxxxxxxxx> · Fri, 18 Aug 2017 16:33:11 -0700

On Fri, 18 Aug 2017 10:18:37 -0400
Ben Peart <peartben@xxxxxxxxx> wrote:

> > But if there was a good way to refer to the "anti-projection" in a
> > virtualized system (that is, the "real" thing or "object" behind the
> > "virtual" thing or "image"), then maybe the "virtualized" language is
> > the best. (And I would gladly change - I'm having a hard time coming up
> > with a name for the "anti-projection" in the "lazy" language.)
> > 
> 
> The most common "anti-virtual" language I'm familiar with is "physical." 
>   Virtual machine <-> physical machine. Virtual world <-> physical 
> world. Virtual repo, commit, tree, blob - physical repo, commit, tree, 
> blob. I'm not thrilled but I think it works...

I was thinking more along the lines of the "entity that projects the
virtualization", not the opposite of a "virtualization" - "physical"
might work for the latter but probably not the former.

After some in-office discussion, if we stick to the "promise" concept,
maybe we have something like this:

  In a partial clone, the origin acts as a promisor of objects. Every
  object obtained from the promisor also acts as a promise that any
  object directly or indirectly referenced from that object is fetchable
  from the promisor.

> > This is not true if you're fetching from another repo 
> 
> This isn't a case we've explicitly dealt with (multiple remotes into a 
> virtualized repo).  Our behavior today would be that once you set the 
> "virtual repo" flag on the repo (this happens at clone for us), all 
> remotes are treated as virtual as well (ie we don't differentiate 
> behavior based on which remote was used).  Our "custom fetcher" always 
> uses "origin" and some custom settings for a cache-server saved in the 
> .git/config file when asked to fetch missing objects.
> 
> This is probably a good model to stick with at least initially as trying 
> to solve multiple possible "virtual" remotes as well as mingling 
> virtualized and non-virtualized remotes and all the mixed cases that can 
> come up makes my head hurt.  We should probably address that in a 
> different thread. :)

OK, let's stick to the current model first then, whether our opinion on
other remotes is (1) "we won't have any other remotes so we don't care",
(2) "we have other remotes but it's fine to make sure that they don't
introduce any new missing objects", or (3) "we need other remotes to
introduce missing objects, but we can build that after this foundation
is laid".

> > or if you're using
> > receive-pack, but (1) I think these are not used as much in such a
> > situation, and (2) if you do use them, the slowness only "kicks in" if
> > you do not have the objects referred to (whether non-"imported" or
> > "imported") and thus have to check the references in all "imported"
> > objects.
> > 
> 
> Is there any case where receive-pack is used on the client side?  I'm 
> only aware of it being used on the server side to receive packs pushed 
> from the client.  If it is not used in a virtualized client, then we 
> would not need to do anything different for receive-pack.

This happens if another repo decides to push to the virtualized client,
which (as I wrote) I don't expect to happen often. My intention is to
ensure that receive-pack will still work.

> That is another good point.  Given the discussion above about not 
> needing to do the connectivity test for fetch/clone - the potential perf 
> hit of loading/parsing all the various objects to build up the oidset is 
> much less of an issue.

Agreed.