[Some of answers may get invalidated by v9] W dniu 30.09.2016 o 20:56, Lars Schneider pisze: >> On 27 Sep 2016, at 00:41, Jakub Narębski <jnareb@xxxxxxxxx> wrote: >> >> Part first of the review of 11/11. [...] >>> +to read a welcome response message ("git-filter-server") and exactly >>> +one protocol version number from the previously sent list. All further >> >> I guess that is to provide forward-compatibility, isn't it? Also, >> "Git expects..." probably means filter process MUST send, in the >> RFC2119 (https://tools.ietf.org/html/rfc2119) meaning. > > True. I feel "expects" reads better but I am happy to change it if > you feel strong about it. I don't have strong opinion about this. I guess following the example of pkt-line format description may be a good idea. We are not writing an RFC here... but having be explicit is better than be good read :-P >>> + >>> +After the version negotiation Git sends a list of supported capabilities >>> +and a flush packet. >> >> Is it that Git SHOULD send list of ALL supported capabilities, or is >> it that Git SHOULD NOT send capabilities it does not support, and that >> it MAY send only those capabilities it needs (so for example if command >> uses only `smudge`, it may not send `clean`, so that filter driver doesn't >> need to initialize data it would not need). > > "After the version negotiation Git sends a list of all capabilities that > it supports and a flush packet." > > Better? Better. I wonder if it is a matter of current implementation, namely that at the point of code where Git is sending capabilities it doesn't know which of them it will be using; at least in some of cases. Because if it would be possible for Git to not send capabilities which it supports, but is sure that it would not be using for a given operation, then it would be good to do that. It might be that filter driver must do some prep work for each of capabilities, so skipping some of prep would make git faster. Though all this is for now theoretical musings... it might be an argument for such description of protocol so it does not prevent Git from sending only supported capabilities it needs. >> I wonder why it is "<capability>=true", and not "capability=<capability>". >> Is there a case where we would want to send "<capability>=false". Or >> is it to allow configurable / value based capabilities? Isn't it going >> a bit too far: is there even a hind of an idea for parametrize-able >> capability? YAGNI is a thing... > > Peff suggested that format and I think it is OK: > http://public-inbox.org/git/20160803224619.bwtbvmslhuicx2qi@xxxxxxxxxxxxxxxxxxxxx/ It wouldn't kill you to summarize the argument, would it? >From what I understand the argument is that "<capability>=true" allows for simplist parsing into a [intermediate] hash, while the other proposal of using"capability=<capability>" would require something more sophisticated. And that it is better to be explicit rather than use synonyms / shortcuts for "<capability>". Though one can argue that "<foo>" is synonym / shortcut for "<foo>=true"; it would not complicate parsing much. Nb. we don't use trick of 'parse metadata to hash' in neither example, nor filter driver used in test... [...] >>> + >>> +After the filter has processed a blob it is expected to wait for >>> +the next "key=value" list containing a command. Git will close >>> +the command pipe on exit. The filter is expected to detect EOF >>> +and exit gracefully on its own. Is this still true? >> >> Good to have it documented. >> >> Anyway, as it is Git command that spawns the filter driver process, >> assuming that the filter process doesn't daemonize itself, wouldn't >> the operating system reap it after its parent process, that is the >> git command it invoked, dies? So detecting EOF is good, but not >> strictly necessary for simple filter that do not need to free >> its resources, or can leave freeing resources to the operating >> system? But I may be wrong here. > > The filter process runs independent of Git. Ah. So without some way to tell long-lived filter process that it can shut down, because no further data will be incoming, or killing it by Git, it would hang indefinitely? [...] >>> + } >>> + elsif ( $pkt_size > 4 ) { >> >> Isn't a packet of $pkt_size == 4 a valid packet, a keep-alive >> one? Or is it forbidden? > > "Implementations SHOULD NOT send an empty pkt-line ("0004")." > Source: Documentation/technical/protocol-common.txt All right. Not that we need keep-alive packet for communication between two processes on the same host. Though I wonder why this rule is here... [...] >>> +( packet_txt_read() eq ( 0, "git-filter-client" ) ) || die "bad initialize"; >>> +( packet_txt_read() eq ( 0, "version=2" ) ) || die "bad version"; >>> +( packet_bin_read() eq ( 1, "" ) ) || die "bad version end"; >> >> Actually, it is overly strict. It should not fail if there >> are other "version=3", "version=4" etc. lines. > > True, but I think for an example this is OK. I'll add a note > to the file header. Anyway it would be better to have helper subroutine to get metadata lines (packets till flush) and parse them into array or a hash... >>> + >>> +while (1) { >>> + my ($command) = packet_txt_read() =~ /^command=([^=]+)$/; >>> + my ($pathname) = packet_txt_read() =~ /^pathname=([^=]+)$/; >> >> Do we require this order? If it is, is that explained in the >> documentation? > > Git sends that order right now but the filter should not rely > on that order. ...and the latter would make ignoring order of lines simpler. Though I wonder if we can ensure in the protocol documentation that those lines would always be send in this order. Best, -- Jakub Narębski