> On 10 Jan 2017, at 00:38, Taylor Blau <ttaylorr@xxxxxxxxxx> wrote: > > I've been considering some alternative approaches in order to make the > communication between Git and any extension that implements this protocol more > intuitive. > > In particular, I'm considering alternatives to: > >> for each delayed paths: >> ensure filter process finished processing for path >> fetch the thing to buf from the process >> do the caller's thing to use buf > > As I understand it, the above sequence of steps would force Git to either: > > a) loop over all delayed paths and ask the filter if it's done processing, > creating a busy-loop between the filter and Git, or... > b) loop over all delayed paths sequentially, checking out each path in sequence > > I would like to avoid both of those situations, and instead opt for an > asynchronous approach. In (a), the protocol is far too chatty. In (b), the > protocol is much less chatty, but forces the checkout to be the very last step, > which has negative performance implications on checkouts with many large files. > > For instance, checking out several multi-gigabyte files one after the other > means that a significant amount of time is lost while the filter has some of the > items ready. Instead of checking them out as they become available, Git waits > until the very end when they are all available. > > I think it would be preferable for the protocol to specify a sort of "done" > signal against each path such that Git could check out delayed paths as they > become available. If implemented this way, Git could checkout files > asynchronously, while the filter continues to do work on the other end. In v1 I implemented a) with the busy-loop problem in mind. My thinking was this: If the filter sees at least one filter request twice then the filter knows that Git has already requested all files that require filtering. At that point the filter could just block the "delayed" answer to the latest filter request until at least one of the previously delayed requests can be fulfilled. Then the filter answers "delay" to Git until Git requests the blob that can be fulfilled. This process cycles until all requests can be fulfilled. Wouldn't that work? I think a "done" message by the filter is not easy. Right now the protocol works in a mode were Git always asks and the filter always answers. I believe changing the filter to be able to initiate a "done" message would complicated the protocol. > Additionally, the protocol should specify a sentinel "no more entries" value > that could be sent from Git to the filter to signal that there are no more files > to checkout. Some filters may implement mechanisms for converting files that > require a signal to know when all files have been sent. Specifically, Git LFS > (https://git-lfs.github.com) batches files to be transferred together, and needs > to know when all files have been announced to truncate and send the last batch, > if it is not yet full. I'm sure other filter implementations use a similar > mechanism and would benefit from this as well. I agree. I think the filter already has this info implicitly as explained above but an explicit message would be better! Thanks, Lars