[Excuse me replying to myself, but there are a few things I forgot, or realized only later] W dniu 31.07.2016 o 00:05, Jakub Narębski pisze: > W dniu 30.07.2016 o 01:38, larsxschneider@xxxxxxxxx pisze: >> From: Lars Schneider <larsxschneider@xxxxxxxxx> >> >> Git's clean/smudge mechanism invokes an external filter process for every >> single blob that is affected by a filter. If Git filters a lot of blobs >> then the startup time of the external filter processes can become a >> significant part of the overall Git execution time. >> >> This patch adds the filter.<driver>.process string option which, if used, >> keeps the external filter process running and processes all blobs with >> the following packet format (pkt-line) based protocol over standard input >> and standard output. > > I think it would be nice to have here at least summary of the benchmarks > you did in https://github.com/github/git-lfs/pull/1382 Note that this feature is especially useful if startup time is long, that is if you are using an operating system with costly fork / new process startup time like MS Windows (which you have mentioned), or writing filter in a programming language with large startup time like Java or Python (the latter may have changed since). https://gnustavo.wordpress.com/2012/06/28/programming-languages-start-up-times/ [...] > I was thinking about having possible responses to receiving file > contents (or starting receiving in the streaming case) to be: > > packet: git< ok size=7\n (or "ok 7\n", if size is known) > > or > > packet: git< ok\n (if filter does not know size upfront) > > or > > packet: git< fail <msg>\n (or just "fail" + packet with msg) > > The last would be when filter knows upfront that it cannot perform > the operation. Though sending an empty file with non-"success" final > would work as well. [...] >> In case the filter cannot process the content, it is expected >> to respond with the result content size 0 (only if "stream" is >> not defined) and a "reject" packet. >> ------------------------ >> packet: git< size=0\n (omitted with capability "stream") >> packet: git< reject\n >> ------------------------ > > This is *wrong* idea! Empty file, with size=0, can be a perfectly > legitimate response. Actually, I think I have misunderstood your intent. If you want to have simpler protocol, with only one place to signal errors, that is after sending a response, then proper way of signaling the error condition would be to send empty file and then "reject" instead of "success": packet: git< size=0\n (omitted with capability "stream") packet: git< 0000 (we need this flush packet) packet: git< reject\n Otherwise in the case without size upfront (capability "stream") file with contents "reject" would be mistaken for the "reject" packet. See below for proposal with two places to signal errors: before sending first byte, and after. NOTE: there is a bit of mixed and possibly confusing notation, that is 0000 is flush packet, not packet with 0000 as content. Perhaps write pkt-line in full? [...] >> --- >> Documentation/gitattributes.txt | 84 ++++++++- >> convert.c | 400 +++++++++++++++++++++++++++++++++++++-- >> t/t0021-conversion.sh | 405 ++++++++++++++++++++++++++++++++++++++++ >> t/t0021/rot13-filter.pl | 177 ++++++++++++++++++ >> 4 files changed, 1053 insertions(+), 13 deletions(-) >> create mode 100755 t/t0021/rot13-filter.pl Wouldn't it be better for easier review to split it into separate patches? Perhaps at least the new test... [...] > I would assume that we have two error conditions. > > First situation is when the filter knows upfront (after receiving name > and size of file, and after receiving contents for not-streaming filters) > that it cannot process the file (like e.g. LFS filter with artifactory > replica/shard being a bit behind master, and not including contents of > the file being filtered). > > My proposal is to reply with "fail" _in place of_ size of reply: > > packet: git< fail\n (any case: size known or not, stream or not) > > It could be "reject", or "error" instead of "fail". > > > Another situation is if filter encounters error during output, > either with streaming filter (or non-stream, but not storing whole > input upfront) realizing in the middle of output that there is something > wrong with input (e.g. converting between encoding, and encountering > character that cannot be represented in output encoding), or e.g. filter > process being killed, or network connection dropping with LFS filter, etc. > The filter has send some packets with output already. In this case > filter should flush, and send "reject" or "error" packet. > > <error condition> > packet: git< "0000" (flush packet) > packet: git< reject\n > > Should there be a place for an error message, or would standard error > (stderr) be used for this? -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html