Hi, On Tue, Feb 26, 2019 at 12:45 AM Jonathan Nieder <jrnieder@xxxxxxxxx> wrote: > > Christian Couder wrote: > > On Sun, Feb 24, 2019 at 12:39 AM Jonathan Tan <jonathantanmy@xxxxxxxxxx> wrote: > > > Especially I'd like to know what should the client do if they find out > > that for example a repo that contains a lot of large files is > > configured so that the large files should be fetched from a CDN that > > the client cannot use? Is the client forced to find or setup another > > repo configured differently if the client still wants to use CDN > > offloading? > > The example from that message: > > For example I think the Great Firewall of China lets people in China > use GitHub.com but not Google.com. So if people start configuring > their repos on GitHub so that they send packs that contain Google.com > CDN URLs (or actually anything that the Firewall blocks), it might > create many problems for users in China if they don't have a way to > opt out of receiving packs with those kind of URLs. > > But the same thing can happen with redirects, with embedded assets in > web pages, and so on. I don't think it's the same situation, because the CDN offloading is likely to be used for large objects that some hosting sites like GitHub, GitLab and BitBucket might not be ok to have them store for free on their machines. (I think the current limitations are around 10GB or 20GB, everything included, for each user.) So it's likely that users will want a way to host on such sites incomplete repos using CDN offloading to a CDN on another site. And then if the CDN is not accessible for some reason, things will completely break when users will clone. You could say that it's the same issue as when a video is not available on a web page, but the web browser can still render the page when a video is not available. So I don't think it's the same kind of issue. And by the way that's a reason why I think it's important to think about this in relation to promisor/partial clone remotes. Because with them it's less of a big deal if the CDN is unavailable, temporarily or not, for some reason. > I think in this situation the user would likely > (and rightly) blame the host (github.com) for requiring access to a > separate inaccessible site, and the problem could be resolved with > them. The host will say that it's repo admins' responsibility to use a CDN that works for the repo users (or to pay for more space on the host). Then repo admins will say that they use this CDN because it's simpler for them or the only thing they can afford or deal with. (For example I don't think it would be easy for westerners to use a Chinese CDN.) Then users will likely blame Git for not supporting a way to use a different CDN than the one configured in each repo. > The beauty of this is that it's transparent to the client: the fact > that packfile transfer was offloaded to a CDN is an implementation > detail, and the server takes full responsibility for it. Who is "the server" in real life? Are you sure they would be ok to take full responsibility? And yes, I agree that transparency for the client is nice. And if it's really nice, then why not have it for promisor/partial clone remotes too? But then do we really need duplicated functionality between promisor remotes and CDN offloading? And also I just think that in real life there needs to be an easy way to override this transparency, and we already have that with promisor remotes. > This doesn't stop a hosting provider from using e.g. server options to > allow the client more control over how their response is served, just > like can be done for other features of how the transfer works (how > often to send progress updates, whether to prioritize latency or > throughput, etc). Could you give a more concrete example of what could be done? > What the client *can* do is turn off support for packfile URLs in a > request completely. This is required for backward compatibility and > allows working around a host that has configured the feature > incorrectly. If the full content of a repo is really large, the size of a full pack file sent by an initial clone could be really big and many client machines could not have enough memory to deal with that. And this suppose that repo hosting providers would be ok to host very large repos in the first place. With promisor remotes, it's less of a problem if for example: - a repo hosting provider is not ok with very large repos, - a CDN is unavailable, - a repo admin has not configured some repos very well. Thanks for your answer, Christian.