On Tue, Jun 26, 2007 at 10:32:23AM -0700, Linus Torvalds wrote: > But we actually _do_ want fully buffered from a performance angle. > Especially for the big stuff, which is usually the *diffs*, not the commit > messages. Not so much an issue with git-rev-list, but with "git log -p" > you would normally not want it line-buffered, and it's actually much nicer > to let it be fully buffered and then do a flush at the end. > > Even pipes are often used for "throughput" stuff if you end up doing some > post-processing (ie "git log -p | gather-statistics"), and yes, I actually > do things like that - it's nice for things like looking at how many lines > have been added during the last release cycle > > git log -p v2.6.21.. | grep '^+' | wc -l > > and I'd really like thigns like that to be close to optimal. > > How much the system call overhead is, I don't know, though. So it might be > worth testing out.... So just for yuks, I devised the following patch, and did some measurements.... For the above pipeline, the result was hardly worth mentioning: With flushing: % time git log -p v2.6.21.. | grep '^+' | wc -l real 0m22.330s user 0m21.512s sys 0m0.807s # of write() system calls according to strace -c: 15167 Without flushing: % time git log -p v2.6.21.. | grep '^+' | wc -l real 0m22.367s user 0m21.355s sys 0m0.720s # of write() system calls according to strace -c: 11373 So here's the worst case: % time git-rev-list HEAD > /dev/null real 0m1.575s user 0m1.477s sys 0m0.053s # of write() system calls according to strace -c: 582 % (export GIT_NEVER_FLUSH_STDOUT=t; time git-rev-list HEAD > /dev/null) real 0m1.535s user 0m1.463s sys 0m0.027s # of write() system calls according to strace -c: 58055 > Under Linux, you'll probably have a fairly hard time > seeing any difference, but under other OS's you have both system call > latency issues *and* possible scheduling issues (ie the above kind of > pipeline can act very differently from a scheduling standpoint if you send > lots of small things rather than buffer things a bit on the generating > side) Indeed. So it's not at all clear this patch is worth applying, but maybe it would make a difference on some other OS; of course, this patch also obviates the original intent of Jim Meyering's patch, since it means that we won't print a useful error message if stdout has been redirected to a file and the write returns ENOSPC, since we won't be fflush() when stdout is redirected to a file. The added fflush() calls to the incremental git-blame and the git-log-*/git-whatchanged might make it worthwhile for tools that depend on those outputs and want faster user response time. So maybe adding the fflush() call might be worthwhile for those programs. - Ted commit 7f483ec6366f62d52199e3edefa292a110fcb5c8 Author: Theodore Ts'o <tytso@xxxxxxx> Date: Thu Jun 28 14:10:58 2007 -0400 Don't fflush(stdout) when it's not helpful This patch arose from a discussion started by Jim Meyering's patch whose intention was to provide better diagnostics for failed writes. Linus proposed a better way to do things, which also had the added benefit that adding a fflush() to git-log-* operations and incremental git-blame operations could improve interactive respose time feel, at the cost of making things a bit slower when we aren't piping the output to a downstream program. This patch skips the fflush() calls when stdout is a regular file, or if the environment variable GIT_NEVER_FLUSH_STDOUT is set. This latter can speed up a command such as: (export GIT_NEVER_FLUSH_STDOUT=t; git-rev-list HEAD | wc -l) a tiny amount. Cc: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> Signed-off-by: "Theodore Ts'o" <tytso@xxxxxxx> diff --git a/builtin-blame.c b/builtin-blame.c index f7e2c13..da23a6f 100644 --- a/builtin-blame.c +++ b/builtin-blame.c @@ -1459,6 +1459,7 @@ static void found_guilty_entry(struct blame_entry *ent) printf("boundary\n"); } write_filename_info(suspect->path); + maybe_flush_or_die(stdout, "stdout"); } } diff --git a/builtin-rev-list.c b/builtin-rev-list.c index 813aadf..86db8b0 100644 --- a/builtin-rev-list.c +++ b/builtin-rev-list.c @@ -100,7 +100,7 @@ static void show_commit(struct commit *commit) printf("%s%c", buf, hdr_termination); free(buf); } - fflush(stdout); + maybe_flush_or_die(stdout, "stdout"); if (commit->parents) { free_commit_list(commit->parents); commit->parents = NULL; diff --git a/cache.h b/cache.h index ed83d92..0525c4e 100644 --- a/cache.h +++ b/cache.h @@ -532,6 +532,8 @@ extern char git_default_name[MAX_GITNAME]; extern const char *git_commit_encoding; extern const char *git_log_output_encoding; +/* IO helper functions */ +extern void maybe_flush_or_die(FILE *, const char *); extern int copy_fd(int ifd, int ofd); extern int read_in_full(int fd, void *buf, size_t count); extern int write_in_full(int fd, const void *buf, size_t count); diff --git a/log-tree.c b/log-tree.c index 0cf21bc..ced3f33 100644 --- a/log-tree.c +++ b/log-tree.c @@ -408,5 +408,6 @@ int log_tree_commit(struct rev_info *opt, struct commit *commit) shown = 1; } opt->loginfo = NULL; + maybe_flush_or_die(stdout, "stdout"); return shown; } diff --git a/write_or_die.c b/write_or_die.c index 5c4bc85..2cebeb5 100644 --- a/write_or_die.c +++ b/write_or_die.c @@ -1,5 +1,43 @@ #include "cache.h" +/* + * Some cases use stdio, but want to flush after the write + * to get error handling (and to get better interactive + * behaviour - not buffering excessively). + * + * Of course, if the flush happened within the write itself, + * we've already lost the error code, and cannot report it any + * more. So we just ignore that case instead (and hope we get + * the right error code on the flush). + * + * If the file handle is stdout, and stdout is a file, then skip the + * flush entirely since it's not needed. + */ +void maybe_flush_or_die(FILE *f, const char *desc) +{ + static int stdout_is_file = -1; + struct stat st; + + if (f == stdout) { + if (stdout_is_file < 0) { + if (getenv("GIT_NEVER_FLUSH_STDOUT") || + ((fstat(fileno(stdout), &st) == 0) && + S_ISREG(st.st_mode))) + stdout_is_file = 1; + else + stdout_is_file = 0; + } + if (stdout_is_file) + return; + } + if (fflush(f)) { + if (errno == EPIPE) + exit(0); + die("write failure on %s: %s", + desc, strerror(errno)); + } +} + int read_in_full(int fd, void *buf, size_t count) { char *p = buf; - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html