[Note that some of this might have been invalidated by v4] W dniu 01.08.2016 o 15:32, Lars Schneider pisze: >> On 31 Jul 2016, at 00:05, Jakub Narębski <jnareb@xxxxxxxxx> wrote: >> W dniu 30.07.2016 o 01:38, larsxschneider@xxxxxxxxx pisze: >>> Git starts the filter on first usage and expects a welcome >>> message, protocol version number, and filter capabilities >>> separated by spaces: >>> ------------------------ >>> packet: git< git-filter-protocol\n >>> packet: git< version 2\n >>> packet: git< capabilities clean smudge\n >> >> Sorry for going back and forth, but now I think that 'capabilities' are >> not really needed here, though they are in line with "version" in >> the second packet / line, namely "version 2". If it does not make >> parsing more difficult... > > I don't understand what you mean with "they are not really needed"? > The field is necessary to understand the protocol, no? > > In the last roll I added the "key=value" format to the protocol upon > yours and Peff's suggestion. Would it be OK to change the startup > sequence accordingly? > > packet: git< version=2\n > packet: git< capabilities=clean smudge\n With current implementation, Git checks second packet of the handshake for version, and third packet for capabilities. The "capabilities" or "capabilities=" is entirely redundant; it is the position of packet (it is the packet number) that matters. At least for now. The only thing that "version" in "version 2" and "capabilities" in "capabilities: clean smudge" helps is self-describing of the protocol. To really make use of them you would have to end handshake with flush packet, and do a parsing: loop over every packet, and match known patterns. Well, perhaps with exception of known header: it doesn't makes sense to have "version N" anywhere else than second packet, and it doesn't makes sense to repeat it. We also don't want to proliferate packets unnecessarily. Each packet is a bit (a tiny bit) of a performance hit. >>> ------------------------ >>> Supported filter capabilities are "clean", "smudge", "stream", >>> and "shutdown". >> >> I'd rather put "stream" and "shutdown" capabilities into separate >> patches, for easier review. > > I agree with "shutdown". I think I would like to remove the "stream" > option and make it the default for the following reasons: > > (1) As you mentioned elsewhere, "stream" is not really streaming at this > point because we don't read/write in parallel. We could, following the example of original per-file filter drivers. It is as simple as starting writer using start_async(), as if we did writing from Git in a child process. Though that might be left for later (assuming that protocol is flexible enough), as synchronous protocol (write, then read) is a bit simpler to implement. > (2) Junio and you pointed out that if we transmit size and flush packet > then we have redundancy in the protocol. Providing size upfront can be a hint for filter or Git. For example HTTP provides Content-Length: header, though it is not strictly necessary. > (3) With the newly introduced "success"/"reject"/"failure" packet at the > end of a filter operation, a filter process has a way to signal Git that > something went wrong. Initially I had the idea that a filter process just > stops writing and Git would detect the mismatch between expected bytes > and received bytes. But the final status packet is a much clearer solution. The solution with stopping writing wouldn't work, I don't think. > (4) Maintaining two slightly different protocols is a waste of resources > and only increases the size of this (already large) patch. Right, better to design and implement basic protocol, taking care that it is extensible, and only then add to it. > My only argument for the size packet was that this allows efficient buffer > allocation. However, in non of my benchmarks this was actually a problem. > Therefore this is probably a epsilon optimization and should be removed. > > OK with everyone? All right. >>> After the filter has processed a blob it is expected to wait for >>> the next command. A demo implementation can be found in >>> `t/t0021/rot13-filter.pl` located in the Git core repository. >> >> If filter does not support "shutdown" capability (or if said >> capability is postponed for later patch), it should behave sanely >> when Git command reaps it (SIGTERM + wait + SIGKILL?, SIGCHLD?). > > How would you do this? Don't you think the current solution is > good enough for processes that don't need a proper shutdown? Actually... couldn't filter driver register atexit() / signal handler to do a clean exit, if it is needed? >> I wonder if it would be worth it to explain the reasoning behind >> this solution and show alternate ones. >> >> * Using a separate variable to signal that filters are invoked >> per-command rather than per-file, and use pkt-line interface, >> like boolean-valued `useProtocol`, or `protocolVersion` set >> to '2' or 'v2', or `persistence` set to 'per-command', there >> is high risk of user's trying to use exiting one-shot per-file >> filters... and Git hanging. >> >> * Using new variables for each capability, e.g. `processSmudge` >> and `processClean` would lead to explosion of variable names; >> I think. >> >> * Current solution of using `process` in addition to `clean` >> and `smudge` clearly says that you need to use different >> command for per-file (`clean` and `smudge`), and per-command >> filter, while allowing to use them together. >> >> The possible disadvantage is Git command starting `process` >> filter, only to see that it doesn't offer required capability, >> for example offering only "clean" but not "smudge". There >> is simple workaround - set `smudge` variable (same as not >> present capability) to empty string. > > If you think it is necessary to have this discussion in the > commit message, then I will add it. I think it would be good idea (not necessary, but helpful), though possibly not in such exacting detail. Just why this one, one sentence or more. >>> +single filter invocation for the entire life of a single Git >>> +command. This is achieved by using the following packet >>> +format (pkt-line, see protocol-common.txt) based protocol over >> >> Can we linkgit-it (to technical documentation)? > > I don't think that is possible because it was never done. See: > git grep "linkgit:tech" A pity. Well, not your problem, anyway. >>> +Git starts the filter on first usage and expects a welcome >> >> Is "usage" here correct? Perhaps it would be more readable >> to say that Git starts filter when encountering first file >> that needs cleaning or smudgeing. > > OK. How about this: > > Git starts the filter when it encounters the first file > that needs to be cleaned or smudged. After the filter started > Git expects a welcome message, protocol version number, and > filter capabilities separated by spaces: Better. >>> + >>> +After the filter has processed a blob it is expected to wait for >>> +the next command. A demo implementation can be found in >>> +`t/t0021/rot13-filter.pl` located in the Git core repository. >> >> It is actually in Git sources. Is it the best way to refer to >> such files? > > Well, I could add a github.com link but I don't think everyone > would like that. What would you suggest? Sorry, I wasn't clear. What I meant is if "<file> located in the Git core repository" is the best way to refer to such files, and if we could do better. But I think it is all right as it is. Later we might want to provide some example filter.<driver>.process filters e.g. in contrib/. But that's for the future. >>> + >>> +Please note that you cannot use an existing filter.<driver>.clean >>> +or filter.<driver>.smudge command as filter.<driver>.process >>> +command. As soon as Git would detect a file that needs to be >>> +processed by this filter, it would stop responding. >> >> This isn't. > > Would that be better? > > > Please note that you cannot use an existing `filter.<driver>.clean` > or `filter.<driver>.smudge` command as `filter.<driver>.process` > command because the former two use a different inter process > communication protocol than the latter one. As soon as Git would detect > a file that needs to be processed by such an invalid "process" filter, > it would wait for a proper protocol handshake and appear "hanging". This is better. -- Jakub Narębski -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html