Re: Proposal for "fetch-any-blob Git protocol" and server design

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 04/12/2017 03:02 PM, Kevin David wrote:
Hi Jonathan,

I work on the network protocols for the GVFS project at Microsoft.
I shared a couple thoughts and questions below.

Thanks for your reply!

I know we're considering server behavior here, but how large do you generally
expect these blob-want requests to be? I ask because we took an initial approach
very similar to this, however, we had a hard time being clever about figuring out
what set of blobs to request for those clients that didn't want the entire set, and
ended up falling back to single-blob requests.

Obviously, this could be due to thenature of our filesystem-virtualization-based client,
but I also suspect that the teams attacking this problem are more often than not dealing
with very large blob objects, so the cost of a round-trip becomes lower relative to sending
the object content itself.

I am envisioning (1a) as described in Jeff Hostetler's e-mail [1] ("a pre-command or hook to identify needed blobs and pre-fetch them before allowing the actual command to start"), so a Git command would typically make a single request that contains all the blobs required, but my proposal can also handle (1c) ('"fault" them in as necessary in read_object() while the command is running and without any pre-fetch (either synchronously or asynchronously and with/without a helper process)').

Even if we decided to go with single-blob requests and responses, it is still important to send them as packfiles, so that the server can serve them directly from its compressed storage without first having to uncompress them.

[1] https://public-inbox.org/git/1488999039-37631-1-git-send-email-git@xxxxxxxxxxxxxxxxx/

Along the same lines as above, this is where we started and it worked well for
low-volume requests. However, when we started ramping up the load,
`pack-objects` operating on a very large packed repository (~150 GiB) became
very computationally expensive, even with `--compression=1 --depth=0 --window=0`.

Being a bit more clever about packing objects (e.g. splitting blobs out from commits
and trees) improved this a bit, but we still hit a bottlenecks from what appeared to
be a large number of memory-mapping operations on a ~140GiB packfile of blobs.

Each `pack-objects` process would consume approximately one CPU core for the
duration of the request. It's possible that further splitting of these large blob packs
would have improved performance in some scenarios, but that would increase the
amount of pack-index lookups necessary to find a single object.

I'm not very experienced with mmap, but I thought that memory-mapping a large file in itself does not incur much of a performance penalty (if any) - it is the accesses that count. I experimented with 15,000 and 150,000 MiB files and mmap and they seem to be handled quite well. Also, how many objects are "pack-objects" packing here?

=== Endpoint support for forward compatibility

This "server" endpoint requires that the first line be understood, but
will ignore any other lines starting with words that it does not
understand. This allows new "commands" to be added (distinguished by
their first lines) and existing commands to be "upgraded" with backwards compatibility.

This seems like a clever way to avoid the canonical `/info/refs?service=git-upload-pack`
capability negotiation on every call. However, using error handling to fallback seems
slightly wonky to me. Hopefully users are incentivized to upgrade their clients.

By "error handling to fallback", do you mean in my proposal or in a possible future one (assuming my proposal is implemented)? I don't think my proposal requires any error handling to fallback (since only new clients can clone partially - old clients will just clone totally and obliviously), but I acknowledge that this proposal does not mean that any future proposal can be done without requiring error handling to fallback.

=== Indication to use the proposed endpoint

The client will probably already record that at least one of its
remotes (the one that it successfully performed a "partial clone"
from) supports this new endpoint (if not, it can’t determine whether a
missing blob was caused by repo corruption or by the "partial clone").
This knowledge can be used both to know that the server supports
"fetch-blob-pack" and "fetch-commit-pack" (for the latter, the client
can fall back to "fetch-pack"/"upload-pack" when fetching from other servers).

This makes a lot of sense to me. When we built our caching proxy, we had to be careful
when designing how we'd handle clients requesting objects missing from the proxy.

For example, a client requests a single blob and the proxy doesn't have it - we can't simply
download that object from the "authoritative" remote and stick it in the `.git\objects\xx\yyy...`
directory, because the repository would be made corrupt.

By proxy, do you mean a Git repository? Sorry, I don't really understand this part.

Having a way to specify that the repo is a "partial clone" and allowing "holes" would help a lot,
I believe. I know there have been varying opinions on how these missing objects should be
"marked" and I'm not ready to propose anything there - just agreeing the problem is important.

Agreed.

Not do derail us too far off blobs, but I wonder if we need a `fetch-commit-pack` endpoint,
or could get away with introducing a new capability (e.g. `no-blobs`) to `upload-pack` instead.
As a casual observer, this seems like it would be a much smaller change since the rest of the
negotiation/reachability calculation would look the same, right? Or would this `fetch-commit-pack`
not return trees either?

I only ask because, in our observations, when git wants to read commits it's
usually followed by a lot of "related" trees - again caveated with the fact that
we're intercepting many things at the filesystem layer.

The main reason for this extra command is not to exclude blobs (which, as you said, can be done with a new capability - I suspect that we will need a capability or parameter of some sort anyway to indicate which size of blobs to filter out) but to eliminate the mandatory ref advertisement that is done whenever the client fetches. One of our use cases (internal Android) has large blobs and many (more than 700k) refs, so it would benefit greatly from blob filtering and elimination of the mandatory ref advertisement (tens of megabytes per fetch).

As for the size of the change, I have a work in progress that implements this [2].

[2] https://public-inbox.org/git/cover.1491851452.git.jonathantanmy@xxxxxxxxxx/

Just to keep the discussion interesting, I'll throw an alternative out there that's
worked well for us. As I understand it, the HTTP-based dumb transfer protocol
supports returning objects in loose object format, but only if they already exist
in loose format.

Extending this to have the remote provide these objects via a "dumb" protocol
when they are packed as well - i.e. the server would "loosens" them upon request -
is basically what we do and it works quite well for low-latency clients. To further improve
performance at the cost of complexity, we've added caching at the memory and disk layer
for these loose objects in the same format we send to the client.

There's a clear tradeoff here - the servers must have adequate disk and/or memory to store
these loose objects in optimal format. In addition, the higher the latency is to the remote,
the worse this solution will perform. Fortunately, in our case, network topology allows us to
put these caching proxies close enough to clients for it not to matter.

This does make sense in the situation you describe, but (as you said) I don't think we can guarantee this in the majority of situations. I think some sort of batching (like the (1a) solution I talked about near the start of this e-mail) and serving packed data from packed storage should form the baseline, and any situation-specific optimizations (e.g. serving unpacked data from topologically-close servers) can be additional steps.



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]