Joey Hess venit, vidit, dixit 16.06.2016 22:32: > This adds new smudge-to-file and clean-from-file filter commands, > which are similar to smudge and clean but allow direct access to files on > disk. > > In smudge-to-file and clean-from-file, "%p" is expanded to the path to the > file that should be cleaned from, or smudged to. > > This interface can be much more efficient when operating on large files, > because the whole file content does not need to be streamed through the > filter. It even allows for things like clean-from-file commands that avoid > reading the whole content of the file, and for smudge-to-file commands that > populate a work tree file using an efficient Copy On Write operation. > > The new filter commands will not be used for all filtering. They are > efficient to use when git add is adding a file, or when the work tree is > being updated, but not a good fit when git is internally filtering blob > objects in memory for eg, a diff. > > So, a user who wants to use smudge-to-file should also provide a smudge > command to be used in cases where smudge-to-file is not used. And ditto > with clean-from-file and clean. To avoid foot-shooting configurations, the > new commands are not used unless the old commands are also configured. I'm not sure this will save all feet. I foresee "why is smudge-to-file not doing anything" reports... In addition, it opens the way to doing completely different things in smudge and smudge-to-file - which partly is intentional, of course. Do you make any promises that %p is a seekable file? > That also ensures that a filter driver configuration that includes these > new commands will work, although less efficiently, when used with an older > version of git that does not support them. > > Signed-off-by: Joey Hess <joeyh@xxxxxxxxxx> > --- > Documentation/config.txt | 27 +++++++--- > Documentation/gitattributes.txt | 37 ++++++++++++++ > convert.c | 107 ++++++++++++++++++++++++++++++++++------ > convert.h | 10 ++++ > 4 files changed, 160 insertions(+), 21 deletions(-) > > diff --git a/Documentation/config.txt b/Documentation/config.txt > index 2e1b2e4..bbb9296 100644 > --- a/Documentation/config.txt > +++ b/Documentation/config.txt > @@ -1299,14 +1299,29 @@ format.useAutoBase:: > format-patch by default. > > filter.<driver>.clean:: > - The command which is used to convert the content of a worktree > - file to a blob upon checkin. See linkgit:gitattributes[5] for > - details. > + The command which is used on checkin to convert the content of > + a worktree file (provided on stdin) to a blob (written to stdout). > + See linkgit:gitattributes[5] for details. > > filter.<driver>.smudge:: > - The command which is used to convert the content of a blob > - object to a worktree file upon checkout. See > - linkgit:gitattributes[5] for details. > + The command which is used on checkout to convert the content of a > + blob object (provided on stdin) to a worktree file (written to > + stdout). > + See linkgit:gitattributes[5] for details. > + > +filter.<driver>.clean-from-file:: > + Optional command which is used on checkin to convert the content > + of a worktree file, which can be read from disk, to a blob > + (written to stdout). > + Only used when filter.<driver>.clean is also configured. > + See linkgit:gitattributes[5] for details. > + > +filter.<driver>.smudge-to-file:: > + Optional command which is used to convert the content of a blob > + object (provided on stdin) to a worktree file, writing directly > + to the file. > + Only used when filter.<driver>.clean is also configured. > + See linkgit:gitattributes[5] for details. > > fsck.<msg-id>:: > Allows overriding the message type (error, warn or ignore) of a > diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt > index e077cc9..32621e7 100644 > --- a/Documentation/gitattributes.txt > +++ b/Documentation/gitattributes.txt > @@ -378,6 +378,43 @@ Note that the "%f" is the name of a file in the git repository; the > corresponding file on disk may not exist, or may have unrelated contents to > what git is filtering. > > +There are two extra commands "clean-from-file" and "smudge-to-file", which > +can optionally be set in a filter driver. These are similar to the "clean" > +and "smudge" commands, but avoid needing to pipe the contents of files > +through the filters, and instead reading/writing files in the filesystem. > +This can be more efficient when using filters with large files that are not > +directly stored in the repository. > + > +Sequence "%p" on the "clean-from-file" and "smudge-to-file" command line > +is replaced with the path to a file that the filter can access. > + > +In the "clean-from-file" command, "%p" is the path to the file that > +it should clean. Like the "clean" command, it should output the cleaned > +version to standard output. > + > +In the "smudge-from-file" command, "%p" is the path to the file that it > +should write to. (This file will already exist, as an empty file that can > +be written to or replaced.) Like the "smudge" command, "smudge-from-file" > +is fed the blob object from its standard input. > + > +Some git operations that need to apply filters cannot use "clean-from-file" > +and "smudge-to-file", since the files are not present to disk. So, to avoid > +inconsistent behavior, "clean-from-file" will only be used if "clean" is > +also configured, and "smudge-to-file" will only be used if "smudge" is also > +configured. > + > +An example large file storage filter driver using "*-from-file" follows: > + > +------------------------ > +[filter "bigfiles"] > + clean = store-bigfile --from-stdin > + clean-from-file = store-bigfile %p > + smudge = retrieve-bigfile --to-stdout > + smudge-to-file = retrieve-bigfile %p > + required > + > +------------------------ > + > Interaction between checkin/checkout attributes > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > diff --git a/convert.c b/convert.c > index b1614bf..bfb7578 100644 > --- a/convert.c > +++ b/convert.c > @@ -360,7 +360,8 @@ struct filter_params { > unsigned long size; > int fd; > const char *cmd; > - const char *path; > + const char *path; /* Path within the git repository */ > + const char *fspath; /* Path to file on disk */ > }; > > static int filter_buffer_or_fd(int in, int out, void *data) > @@ -376,16 +377,22 @@ static int filter_buffer_or_fd(int in, int out, void *data) > /* apply % substitution to cmd */ > struct strbuf cmd = STRBUF_INIT; > struct strbuf path = STRBUF_INIT; > + struct strbuf fspath = STRBUF_INIT; > struct strbuf_expand_dict_entry dict[] = { > { "f", NULL, }, > + { (params->fspath ? "p" : NULL), NULL, }, > { NULL, NULL, }, > }; > > /* quote the path to preserve spaces, etc. */ > sq_quote_buf(&path, params->path); > dict[0].value = path.buf; > + if (params->fspath) { > + sq_quote_buf(&fspath, params->fspath); > + dict[1].value = fspath.buf; > + } > > - /* expand all %f with the quoted path */ > + /* expand all %f and %p with the quoted path and fspath */ > strbuf_expand(&cmd, params->cmd, strbuf_expand_dict_cb, &dict); > strbuf_release(&path); > > @@ -427,7 +434,8 @@ static int filter_buffer_or_fd(int in, int out, void *data) > return (write_err || status); > } > > -static int apply_filter(const char *path, const char *src, size_t len, int fd, > +static int apply_filter(const char *path, const char *fspath, > + const char *src, size_t len, int fd, > struct strbuf *dst, const char *cmd) > { > /* > @@ -456,6 +464,7 @@ static int apply_filter(const char *path, const char *src, size_t len, int fd, > params.fd = fd; > params.cmd = cmd; > params.path = path; > + params.fspath = fspath; > > fflush(NULL); > if (start_async(&async)) > @@ -486,6 +495,8 @@ static struct convert_driver { > struct convert_driver *next; > const char *smudge; > const char *clean; > + const char *smudge_to_file; > + const char *clean_from_file; > int required; > } *user_convert, **user_convert_tail; > > @@ -512,8 +523,9 @@ static int read_convert_config(const char *var, const char *value, void *cb) > } > > /* > - * filter.<name>.smudge and filter.<name>.clean specifies > - * the command line: > + * filter.<name>.smudge, filter.<name>.clean, > + * filter.<name>.smudge-to-file, filter.<name>.clean-from-file > + * specifies the command line: > * > * command-line > * > @@ -526,6 +538,12 @@ static int read_convert_config(const char *var, const char *value, void *cb) > if (!strcmp("clean", key)) > return git_config_string(&drv->clean, var, value); > > + if (!strcmp("smudge-to-file", key)) > + return git_config_string(&drv->smudge_to_file, var, value); > + > + if (!strcmp("clean-from-file", key)) > + return git_config_string(&drv->clean_from_file, var, value); > + > if (!strcmp("required", key)) { > drv->required = git_config_bool(var, value); > return 0; > @@ -823,7 +841,35 @@ int would_convert_to_git_filter_fd(const char *path) > if (!ca.drv->required) > return 0; > > - return apply_filter(path, NULL, 0, -1, NULL, ca.drv->clean); > + return apply_filter(path, NULL, NULL, 0, -1, NULL, ca.drv->clean); > +} > + > +int can_clean_from_file(const char *path) > +{ > + struct conv_attrs ca; > + > + convert_attrs(&ca, path); > + if (!ca.drv) > + return 0; > + > + /* Only use the clean-from-file filter when the clean filter is also > + * configured. > + */ > + return (ca.drv->clean_from_file && ca.drv->clean); > +} > + > +int can_smudge_to_file(const char *path) > +{ > + struct conv_attrs ca; > + > + convert_attrs(&ca, path); > + if (!ca.drv) > + return 0; > + > + /* Only use the smudge-to-file filter when the smudge filter is also > + * configured. > + */ > + return (ca.drv->smudge_to_file && ca.drv->smudge); > } > > const char *get_convert_attr_ascii(const char *path) > @@ -866,7 +912,7 @@ int convert_to_git(const char *path, const char *src, size_t len, > required = ca.drv->required; > } > > - ret |= apply_filter(path, src, len, -1, dst, filter); > + ret |= apply_filter(path, NULL, src, len, -1, dst, filter); > if (!ret && required) > die("%s: clean filter '%s' failed", path, ca.drv->name); > > @@ -891,14 +937,33 @@ void convert_to_git_filter_fd(const char *path, int fd, struct strbuf *dst, > assert(ca.drv); > assert(ca.drv->clean); > > - if (!apply_filter(path, NULL, 0, fd, dst, ca.drv->clean)) > + if (!apply_filter(path, NULL, NULL, 0, fd, dst, ca.drv->clean)) > die("%s: clean filter '%s' failed", path, ca.drv->name); > > crlf_to_git(path, dst->buf, dst->len, dst, ca.crlf_action, checksafe); > ident_to_git(path, dst->buf, dst->len, dst, ca.ident); > } > > -static int convert_to_working_tree_internal(const char *path, const char *src, > +void convert_to_git_filter_from_file(const char *path, struct strbuf *dst, > + enum safe_crlf checksafe) > +{ > + struct conv_attrs ca; > + convert_attrs(&ca, path); > + > + assert(ca.drv); > + assert(ca.drv->clean); > + assert(ca.drv->clean_from_file); > + > + if (!apply_filter(path, path, "", 0, -1, dst, ca.drv->clean_from_file)) > + die("%s: clean-from-file filter '%s' failed", path, ca.drv->name); > + > + crlf_to_git(path, dst->buf, dst->len, dst, ca.crlf_action, checksafe); > + ident_to_git(path, dst->buf, dst->len, dst, ca.ident); > +} > + > +static int convert_to_working_tree_internal(const char *path, > + const char *destpath, > + const char *src, > size_t len, struct strbuf *dst, > int normalizing) > { > @@ -909,7 +974,10 @@ static int convert_to_working_tree_internal(const char *path, const char *src, > > convert_attrs(&ca, path); > if (ca.drv) { > - filter = ca.drv->smudge; > + if (destpath) > + filter = ca.drv->smudge_to_file; > + else > + filter = ca.drv->smudge; > required = ca.drv->required; > } > > @@ -920,7 +988,7 @@ static int convert_to_working_tree_internal(const char *path, const char *src, > } > /* > * CRLF conversion can be skipped if normalizing, unless there > - * is a smudge filter. The filter might expect CRLFs. > + * is a filter. The filter might expect CRLFs. > */ > if (filter || !normalizing) { > ret |= crlf_to_worktree(path, src, len, dst, ca.crlf_action); > @@ -930,21 +998,30 @@ static int convert_to_working_tree_internal(const char *path, const char *src, > } > } > > - ret_filter = apply_filter(path, src, len, -1, dst, filter); > + ret_filter = apply_filter(path, destpath, src, len, -1, dst, filter); > if (!ret_filter && required) > - die("%s: smudge filter %s failed", path, ca.drv->name); > + die("%s: %s filter %s failed", path, destpath ? "smudge-to-file" : "smudge", ca.drv->name); > > return ret | ret_filter; > } > > int convert_to_working_tree(const char *path, const char *src, size_t len, struct strbuf *dst) > { > - return convert_to_working_tree_internal(path, src, len, dst, 0); > + return convert_to_working_tree_internal(path, NULL, src, len, dst, 0); > +} > + > +int convert_to_working_tree_filter_to_file(const char *path, const char *destpath, const char *src, size_t len) > +{ > + struct strbuf output = STRBUF_INIT; > + int ret = convert_to_working_tree_internal(path, destpath, src, len, &output, 0); > + /* The smudge-to-file filter stdout is not used. */ > + strbuf_release(&output); > + return ret; > } > > int renormalize_buffer(const char *path, const char *src, size_t len, struct strbuf *dst) > { > - int ret = convert_to_working_tree_internal(path, src, len, dst, 1); > + int ret = convert_to_working_tree_internal(path, NULL, src, len, dst, 1); > if (ret) { > src = dst->buf; > len = dst->len; > diff --git a/convert.h b/convert.h > index ccf436b..53e1474 100644 > --- a/convert.h > +++ b/convert.h > @@ -41,6 +41,10 @@ extern int convert_to_git(const char *path, const char *src, size_t len, > struct strbuf *dst, enum safe_crlf checksafe); > extern int convert_to_working_tree(const char *path, const char *src, > size_t len, struct strbuf *dst); > +extern int convert_to_working_tree_filter_to_file(const char *path, > + const char *destpath, > + const char *src, > + size_t len); > extern int renormalize_buffer(const char *path, const char *src, size_t len, > struct strbuf *dst); > static inline int would_convert_to_git(const char *path) > @@ -52,6 +56,12 @@ extern void convert_to_git_filter_fd(const char *path, int fd, > struct strbuf *dst, > enum safe_crlf checksafe); > extern int would_convert_to_git_filter_fd(const char *path); > +/* Precondition: can_clean_from_file(path) == true */ > +extern void convert_to_git_filter_from_file(const char *path, > + struct strbuf *dst, > + enum safe_crlf checksafe); > +extern int can_clean_from_file(const char *path); > +extern int can_smudge_to_file(const char *path); > > /***************************************************************** > * > -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html