On Wed, 27 Sep 2017 18:46:30 +0200 Christian Couder <christian.couder@xxxxxxxxx> wrote: > I am ok to split the patch series, but I am not sure that 01/40 to > 09/40 is the right range for the first patch series. > I would say that 01/40 to 07/40 is better as it can be seen as a > separate refactoring. I mentioned 09/40 because this is (as far as I can tell) the first one that introduces a new design. > I don't think single-shot processes would be a huge burden, because > the code is simpler, and because for example for filters we already > have single shot and long-running processes and no one complains about > that. It's code that is useful as it makes it much easier for people > to do some things (see the clone bundle example). > > In fact in Git development we usually start to by first implementing > simpler single-shot solutions, before thinking, when the need arise, > to make it faster. So a perhaps an equally valid opinion could be to > first only submit the patches for the single-shot protocol and later > submit the rest of the series when we start getting feedback about how > external odbs are used. My concern is that, as far as I understand about the Microsoft use case, we already know that we need the faster solution, so the need has already arisen. > And yeah I could change the order of the patch series to implement the > long-running processes first and the single-shot process last, so that > it could be possible to first get feedback about the long-running > processes, before we decide to merge or not the single-shot stuff, but > I don't think it would look like the most logical order. My thinking was that we would just implement the long-running process and not implement the single-shot process at all (besides maybe a script in contrib/). If we are going to do both anyway, I agree that we should do the single-shot process first. > > And another possible issue is that we design ourselves into a corner. > > Thinking about the use cases that I know about (the Android use case and > > the Microsoft GVFS use case), I don't think we are doing that - for > > Android, this means that large blob metadata needs to be part of the > > design (and this patch series does provide for that), and for Microsoft > > GVFS, "get" is relatively cheap, so a configuration option to not invoke > > "have" first when loading a missing object might be sufficient. > > If the helper does not advertise the "have" capability, the "have" > instruction will not be sent to the helper, so the current design is > already working for that case. Ah, that's good to know. > > And I think that my design can be extended to support a use case in > > which, for example, blobs corresponding to a certain type of filename > > (defined by a glob like in gitattributes) can be excluded during > > fetch/clone, much like --blob-max-bytes, and they can be fetched either > > through the built-in mechanism or through a custom hook. > > Sure, we could probably rebuild something equivalent to what I did on > top of your design. > My opinion though is that if we want to eventually get to the same > goal, it is better to first merge something that get us very close to > the end goal and then add some improvements on top of it. I agree - I mentioned that because I personally prefer to review smaller patch sets at a time, and my patch set already includes a lot of the same infrastructure needed by yours - for example, the places in the code to dynamically fetch objects, exclusion of objects when fetching or cloning, configuring the cloned repo when cloning, fsck, and gc. > > - I get compile errors when I "git am" these onto master. I think > > '#include "config.h"' is needed in some places. > > It's strange because I get no compile errors even after a "make clean" > from my branch. > Could you show the actual errors? I don't have the error messages with me now, but it was something about a function being implicitly declared. You will probably get these errors if you sync past commit e67a57f ("config: create config.h", 2017-06-15). > > Any reason why you prefer to update the loose object functions than to > > update the generic one (sha1_object_info_extended)? My concern with just > > updating the loose object functions was that a caller might have > > obtained the path by iterating through the loose object dirs, and in > > that case we shouldn't query the external ODB for anything. > > You are thinking about fsck or gc? > Otherwise I don't think it would be clean to iterate through loose object dirs. Yes, fsck and gc (well, prune, I think) do that. I agree that Git typically doesn't do that (except for exceptional cases like fsck and gc), but I was thinking about supporting existing code that does that iteration, not introducing new code that does that.