Re: [PATCH v2 4/4] bundle v3: the beginning

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jun 07, 2016 at 03:19:46PM +0200, Christian Couder wrote:

> >      But there are lots of cases where the server might want to tell
> >      the client that don't involve bundles at all.
> 
> The idea is also that anytime the server needs to send external ODB
> data to the client, it would ask its own external ODB to prepare a
> kind of bundle with that data and use the bundle v3 mechanism to send
> it.
> That may need the bundle v3 mechanism to be extended, but I don't see
> in which cases it would not work.

Ah, I see we do not have the same underlying mental model.

I think the external odb is purely the _client's_ business. The server
does not have to have an external odb at all, and does not need to know
about the client's. The client is responsible for telling the server
during the git protocol anything it would need to know (like "do not
bother sending objects over 50MB; I can get them elsewhere").

This makes the problem much more complicated, but it is more flexible
and decentralized.

> >        a. The receiving side of a connection (e.g., a fetch client)
> >           somehow has out-of-band access to some objects. How does it
> >           tell the other side "do not bother sending me these objects; I
> >           can get them in another way"?
> 
> I don't see a difference with regular objects that the fetch client
> already has. If it already has some regular objects, a way to tell the
> server "don't bother sending me these objects" is useful already and
> it should be possible to use it to tell the server that there is no
> need to send some objects stored in the external ODB too.

The way to do that with normal objects is by finding shared commit tips,
and assuming the normal git repository property of "if you have X, you
have all of the objects reachable from X".

This whole idea is essentially creating "holes" in that property. You
can enumerate all of the holes, but I am not sure that scales well. We
get a lot of efficiency by communicating only ref tips during the
negotiation, and not individual object names.

> Also something like this is needed for shallow clones and narrow
> clones anyway.

Yes, and I don't think it scales well there, either. A single shallow
cutoff works OK. But if you repeatedly shallow-fetch into a repository,
you end up with a patchwork of disconnected "islands" of history. The
CPU required on the server side to serve those fetch requests is much
greater than what would normally be needed. You can't use things like
reachability bitmaps, and you have to open up the trees for each island
to see which objects the other side actually has.

> >        b. The receiving side of a connection has out-of-band access to
> >           some objects. Some of these will be expensive to get (e.g.,
> >           requiring a large download), and some may be fast (e.g.,
> >           they've already been fetched to a local cache). How do we tell
> >           the sending side not to assume we have cheap access to these
> >           objects (e.g., for use as a delta base)?
> 
> I don't think we need to tell the sending side we have cheap access or
> not to some objects.
> If the objects are managed by the external ODB, it's the external ODB
> on the server and on the client that will manage these objects. They
> should not be used as delta bases.
> Perhaps there is no mechanism to say that some objects (basically all
> external ODB managed objects) should not be used as delta bases, but
> that could be added.

Yes, I agree that _if_ the server can access the list of objects
available in the external odb, this becomes much easier. I'm just not
convinced that level of coupling is a good idea.

Note that the server would also want to take this into account during
repacking, as otherwise you end up with fetches that are very expensive
to serve (you want to send X which is a delta based on Y, but you know
that Y is available via the external odb, and therefore should not be
used as a base. So you have to throw out the delta for X and either send
it whole or compute a new one. That's much more expensive than blitting
the delta from disk, which is what a normal clone would do).

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]