(It looks like I did not reply to this other email yet, sorry about this late reply.) On Wed, Jul 12, 2017 at 9:06 PM, Jonathan Tan <jonathantanmy@xxxxxxxxxx> wrote: > On Tue, 20 Jun 2017 09:54:34 +0200 > Christian Couder <christian.couder@xxxxxxxxx> wrote: > >> Git can store its objects only in the form of loose objects in >> separate files or packed objects in a pack file. >> >> To be able to better handle some kind of objects, for example big >> blobs, it would be nice if Git could store its objects in other object >> databases (ODB). > > Thanks for this, and sorry for the late reply. It's good to know that > others are thinking about "missing" objects in repos too. > >> - "have": the helper should respond with the sha1, size and type of >> all the objects the external ODB contains, one object per line. > > This should work well if we are not caching this "have" information > locally (that is, if the object store can be accessed with low latency), > but I am not sure if this will work otherwise. Yeah, there could be problems related to caching or not caching the "have" information. As a repo should not send the blobs that are in an external odb, I think it could be useful to cache the "have" information. I plan to take a look and add related tests soon. > I see that you have > proposed a local cache-using method later in the e-mail - my comments on > that are below. > >> - "get <sha1>": the helper should then read from the external ODB >> the content of the object corresponding to <sha1> and pass it to >> Git. > > This makes sense - I have some patches [1] that implement this with the > "fault_in" mechanism described in your e-mail. > > [1] https://public-inbox.org/git/cover.1499800530.git.jonathantanmy@xxxxxxxxxx/ > >> * Transfering information >> >> To tranfer information about the blobs stored in external ODB, some >> special refs, called "odb ref", similar as replace refs, are used in >> the tests of this series, but in general nothing forces the helper to >> use that mechanism. >> >> The external odb helper is responsible for using and creating the refs >> in refs/odbs/<odbname>/, if it wants to do that. It is free for >> example to just create one ref, as it is also free to create many >> refs. Git would just transmit the refs that have been created by this >> helper, if Git is asked to do so. >> >> For now in the tests there is one odb ref per blob, as it is simple >> and as it is similar to what git-lfs does. Each ref name is >> refs/odbs/<odbname>/<sha1> where <sha1> is the sha1 of the blob stored >> in the external odb named <odbname>. >> >> These odb refs point to a blob that is stored in the Git >> repository and contain information about the blob stored in the >> external odb. This information can be specific to the external odb. >> The repos can then share this information using commands like: >> >> `git fetch origin "refs/odbs/<odbname>/*:refs/odbs/<odbname>/*"` >> >> At the end of the current patch series, "git clone" is teached a >> "--initial-refspec" option, that asks it to first fetch some specified >> refs. This is used in the tests to fetch the odb refs first. >> >> This way only one "git clone" command can setup a repo using the >> external ODB mechanism as long as the right helper is installed on the >> machine and as long as the following options are used: >> >> - "--initial-refspec <odbrefspec>" to fetch the odb refspec >> - "-c odb.<odbname>.command=<helper>" to configure the helper > > A method like this means that information about every object is > downloaded, regardless of which branches were actually cloned, and > regardless of what parameters (e.g. max blob size) were used to control > the objects that were actually cloned. > > We could make, say, one "odb ref" per size and branch - for example, > "refs/odbs/master/0", "refs/odbs/master/1k", "refs/odbs/master/1m", etc. > - and have the client know which one to download. But this wouldn't > scale if we introduce different object filters in the clone and fetch > commands. Yeah, there are multiple ways to do that. > I think that it is best to have upload-pack send this information > together with the packfile, since it knows exactly what objects were > omitted, and therefore what information the client needs. As discussed > in a sibling e-mail, clone/fetch already needs to be modified to omit > objects anyway. I try to avoid sending this information as I don't think it is necessary and it simplify things a lot to not have to change the communication protocol.