On Fri, Sep 27, 2024 at 03:48:11PM -0700, Junio C Hamano wrote: > Christian Couder <christian.couder@xxxxxxxxx> writes: > > > By the way there was an unconference breakout session on day 2 of the > > Git Merge called "Git LFS Can we do better?" where this was discussed > > with a number of people. Scott Chacon took some notes: > > > > https://github.com/git/git-merge/blob/main/breakouts/git-lfs.md > > Thanks for a link. > > > It was in parallel with the Contributor Summit, so few contributors > > participated in this session (maybe only Michael Haggerty, John Cai > > and me). But the impression of GitLab people there, including me, was > > that folks in general would be happy to have an alternative to Git LFS > > based on this. > > I am not sure what "based on this" is really about, though. > > This series adds a feature to redirect requests to one server to > another, but does it really have much to solve the problem LFS wants > to solve? I would imagine that you would want to be able to manage > larger objects separately to avoid affecting the performance and > convenience when handling smaller objects, and to serve these larger > objects from a dedicated server. You certainly can filter the > larger blobs away with blob size filter, but when you really need > these larger blobs, it is unclear how the new capability helps, as > you cannot really tell what the criteria the serving side that gave > you the "promisor-remote" capability wants you to use to sift your > requests between the original server and the new promisor. Wouldn't > your requests _all_ be redirected to a single place, the promisor > remote you learned via the capability? > > Coming up with a better alternative to LFS is certainly good, and it > is worthwhile addtion to the system. I just do not see how the > topic of this series helps further that goal. I guess it helps to address part of the problem. I'm not sure whether my understanding is aligned with Chris' intention, but I could certainly see that at some point in time we start to advertise promisor remote URLs that use different transport helpers to fetch objects. This would allow hosting providers to offload objects to e.g. blob storage or somesuch thing and the client would know how to fetch them. But there are still a couple of pieces missing in the bigger puzzle: - How would a client know to omit certain objects? Right now it only knows that there are promisor remotes, but it doesn't know that it e.g. should omit every blob larger than X megabytes. The answer could of course be that the client should just know to do a partial clone by themselves. - Storing those large objects locally is still expensive. We had discussions in the past where such objects could be stored uncompressed to stop wasting compute here. At GitLab, we're thinking about the ability to use rolling hash functions to chunk such big objects into smaller parts to also allow for somewhat efficient deduplication. We're also thinking about how to make the overall ODB pluggable such that we can eventually make it more scalable in this context. But that's of course thinking into the future quite a bit. - Local repositories would likely want to prune large objects that have not been accessed for a while to eventually regain some storage space. I think chipping away the problems one by one is fine. But it would be nice to draw something like a "big picture" of where we eventually want to end up at and how all the parts connect with each other to form a viable native replacement for Git LFS. Also Cc'ing brian, who likely has a thing or two to say about this :) Patrick