> On 12 Jul 2016, at 00:45, Joey Hess <joeyh@xxxxxxxxxx> wrote: > > This adds new smudgeToFile and cleanFromFile filter commands, > which are similar to smudge and clean but allow direct access to files on > disk. > > This interface can be much more efficient when operating on large files, > because the whole file content does not need to be streamed through the > filter. It even allows for things like cleanFromFile commands that avoid > reading the whole content of the file, and for smudgeToFile commands that > populate a work tree file using an efficient Copy On Write operation. > > The new filter commands will not be used for all filtering. They are > efficient to use when git add is adding a file, or when the work tree is > being updated, but not a good fit when git is internally filtering blob > objects in memory for eg, a diff. > > So, a user who wants to use smudgeToFile should also provide a smudge > command to be used in cases where smudgeToFile is not used. And ditto > with cleanFromFile and clean. To avoid foot-shooting configurations, the > new commands are not used unless the old commands are also configured. > > That also ensures that a filter driver configuration that includes these > new commands will work, although less efficiently, when used with an older > version of git that does not support them. > > Signed-off-by: Joey Hess <joeyh@xxxxxxxxxx> > --- > Documentation/config.txt | 18 ++++++- > Documentation/gitattributes.txt | 37 ++++++++++++++ > convert.c | 111 +++++++++++++++++++++++++++++++++++----- > convert.h | 10 ++++ > 4 files changed, 160 insertions(+), 16 deletions(-) > > diff --git a/Documentation/config.txt b/Documentation/config.txt > index 19493aa..a55bed8 100644 > --- a/Documentation/config.txt > +++ b/Documentation/config.txt > @@ -1325,15 +1325,29 @@ format.useAutoBase:: > format-patch by default. > > filter.<driver>.clean:: > - The command which is used to convert the content of a worktree > + The command which is used as a filter to convert the content of a worktree > file to a blob upon checkin. See linkgit:gitattributes[5] for > details. > > filter.<driver>.smudge:: > - The command which is used to convert the content of a blob > + The command which is used as a filter to convert the content of a blob > object to a worktree file upon checkout. See > linkgit:gitattributes[5] for details. > > +filter.<driver>.cleanFromFile:: > + Similar to filter.<driver>.clean but the specified command > + directly accesses a worktree file on disk, rather than > + receiving the file content from standard input. > + Only used when filter.<driver>.clean is also configured. > + See linkgit:gitattributes[5] for details. > + > +filter.<driver>.smudgeToFile:: > + Similar to filter.<driver>.smudge but the specified command > + writes the content of a blob directly to a worktree file, > + rather than to standard output. > + Only used when filter.<driver>.smudge is also configured. > + See linkgit:gitattributes[5] for details. > + > fsck.<msg-id>:: > Allows overriding the message type (error, warn or ignore) of a > specific message ID such as `missingEmail`. > diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt > index 197ece8..a58aafc 100644 > --- a/Documentation/gitattributes.txt > +++ b/Documentation/gitattributes.txt > @@ -385,6 +385,43 @@ not exist, or may have different contents. So, smudge and clean commands > should not try to access the file on disk, but only act as filters on the > content provided to them on standard input. > > +There are two extra commands "cleanFromFile" and "smudgeToFile", which > +can optionally be set in a filter driver. These are similar to the "clean" > +and "smudge" commands, but avoid needing to pipe the contents of files > +through the filters, and instead read/write files in the filesystem. > +This can be more efficient when using filters with large files that are not > +directly stored in the repository. > + > +Both "cleanFromFile" and "smudgeToFile" are provided a path as an > +added parameter after the configured command line. > + > +The "cleanFromFile" command is provided the path to the file that > +it should clean. Like the "clean" command, it should output the cleaned > +version to standard output. > + > +The "smudgeToFile" command is provided a path to the file that it > +should write to. (This file will already exist, as an empty file that can > +be written to or replaced.) Like the "smudge" command, "smudgeToFile" > +is fed the blob object from its standard input. > + > +Some git operations that need to apply filters cannot use "cleanFromFile" > +and "smudgeToFile", since the files are not present to disk. So, to avoid > +inconsistent behavior, "cleanFromFile" will only be used if "clean" is > +also configured, and "smudgeToFile" will only be used if "smudge" is also > +configured. > + > +An example large file storage filter driver using cleanFromFile and > +smudgeToFile follows: > + > +------------------------ > +[filter "bigfiles"] > + clean = store-bigfile --from-stdin > + cleanFromFile = store-bigfile --from-file > + smudge = retrieve-bigfile --to-stdout > + smudgeToFile = retrieve-bigfile --to-file > + required Minor nit: Do we need "required" in the minimal example? Plus, all test cases use "required = true" (I just learned about that short-hand version). > +------------------------ > + > Interaction between checkin/checkout attributes > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > diff --git a/convert.c b/convert.c > index 214c99f..eb7774f 100644 > --- a/convert.c > +++ b/convert.c > @@ -358,7 +358,8 @@ struct filter_params { > unsigned long size; > int fd; > const char *cmd; > - const char *path; > + const char *path; /* Path within the git repository */ > + const char *fspath; /* Path to file on disk */ Good comment here. However, I wonder if these two "path" variables could be confused in other parts of this file. I think with an additional apply_filter function you could avoid this confusion and it would be easier for me to apply my "long running filter patch" on top :-) https://github.com/larsxschneider/git/blob/74e22bd4e0b505785fa8ffc2ef15721909635d1c/convert.c#L1143-L1146 https://github.com/larsxschneider/git/blob/74e22bd4e0b505785fa8ffc2ef15721909635d1c/convert.c#L402-L488 Thanks for this patch series, Lars > }; > > static int filter_buffer_or_fd(int in, int out, void *data) > @@ -387,6 +388,15 @@ static int filter_buffer_or_fd(int in, int out, void *data) > strbuf_expand(&cmd, params->cmd, strbuf_expand_dict_cb, &dict); > strbuf_release(&path); > > + /* append fspath to the command if it's set, separated with a space */ > + if (params->fspath) { > + struct strbuf fspath = STRBUF_INIT; > + sq_quote_buf(&fspath, params->fspath); > + strbuf_addstr(&cmd, " "); > + strbuf_addbuf(&cmd, &fspath); > + strbuf_release(&fspath); > + } > + > argv[0] = cmd.buf; > > child_process.argv = argv; > @@ -425,7 +435,8 @@ static int filter_buffer_or_fd(int in, int out, void *data) > return (write_err || status); > } > > -static int apply_filter(const char *path, const char *src, size_t len, int fd, > +static int apply_filter(const char *path, const char *fspath, > + const char *src, size_t len, int fd, > struct strbuf *dst, const char *cmd) > { > /* > @@ -454,6 +465,7 @@ static int apply_filter(const char *path, const char *src, size_t len, int fd, > params.fd = fd; > params.cmd = cmd; > params.path = path; > + params.fspath = fspath; > > fflush(NULL); > if (start_async(&async)) > @@ -484,6 +496,8 @@ static struct convert_driver { > struct convert_driver *next; > const char *smudge; > const char *clean; > + const char *smudge_to_file; > + const char *clean_from_file; > int required; > } *user_convert, **user_convert_tail; > > @@ -510,8 +524,9 @@ static int read_convert_config(const char *var, const char *value, void *cb) > } > > /* > - * filter.<name>.smudge and filter.<name>.clean specifies > - * the command line: > + * filter.<name>.smudge, filter.<name>.clean, > + * filter.<name>.smudgeToFile, filter.<name>.cleanFromFile > + * specifies the command line: > * > * command-line > * > @@ -524,6 +539,12 @@ static int read_convert_config(const char *var, const char *value, void *cb) > if (!strcmp("clean", key)) > return git_config_string(&drv->clean, var, value); > > + if (!strcmp("smudgetofile", key)) > + return git_config_string(&drv->smudge_to_file, var, value); > + > + if (!strcmp("cleanfromfile", key)) > + return git_config_string(&drv->clean_from_file, var, value); > + > if (!strcmp("required", key)) { > drv->required = git_config_bool(var, value); > return 0; > @@ -821,7 +842,37 @@ int would_convert_to_git_filter_fd(const char *path) > if (!ca.drv->required) > return 0; > > - return apply_filter(path, NULL, 0, -1, NULL, ca.drv->clean); > + return apply_filter(path, NULL, NULL, 0, -1, NULL, ca.drv->clean); > +} > + > +int can_clean_from_file(const char *path) > +{ > + struct conv_attrs ca; > + > + convert_attrs(&ca, path); > + if (!ca.drv) > + return 0; > + > + /* > + * Only use the cleanFromFile filter when the clean filter is also > + * configured. > + */ > + return (ca.drv->clean_from_file && ca.drv->clean); > +} > + > +int can_smudge_to_file(const char *path) > +{ > + struct conv_attrs ca; > + > + convert_attrs(&ca, path); > + if (!ca.drv) > + return 0; > + > + /* > + * Only use the smudgeToFile filter when the smudge filter is also > + * configured. > + */ > + return (ca.drv->smudge_to_file && ca.drv->smudge); > } > > const char *get_convert_attr_ascii(const char *path) > @@ -864,7 +915,7 @@ int convert_to_git(const char *path, const char *src, size_t len, > required = ca.drv->required; > } > > - ret |= apply_filter(path, src, len, -1, dst, filter); > + ret |= apply_filter(path, NULL, src, len, -1, dst, filter); > if (!ret && required) > die("%s: clean filter '%s' failed", path, ca.drv->name); > > @@ -889,14 +940,34 @@ void convert_to_git_filter_fd(const char *path, int fd, struct strbuf *dst, > assert(ca.drv); > assert(ca.drv->clean); > > - if (!apply_filter(path, NULL, 0, fd, dst, ca.drv->clean)) > + if (!apply_filter(path, NULL, NULL, 0, fd, dst, ca.drv->clean)) > die("%s: clean filter '%s' failed", path, ca.drv->name); > > crlf_to_git(path, dst->buf, dst->len, dst, ca.crlf_action, checksafe); > ident_to_git(path, dst->buf, dst->len, dst, ca.ident); > } > > -static int convert_to_working_tree_internal(const char *path, const char *src, > +void convert_to_git_filter_from_file(const char *path, struct strbuf *dst, > + enum safe_crlf checksafe) > +{ > + struct conv_attrs ca; > + convert_attrs(&ca, path); > + > + assert(ca.drv); > + assert(ca.drv->clean); > + assert(ca.drv->clean_from_file); > + > + if (!apply_filter(path, path, "", 0, -1, dst, ca.drv->clean_from_file)) > + die("%s: cleanFromFile filter '%s' failed", path, ca.drv->name); > + > + crlf_to_git(path, dst->buf, dst->len, dst, ca.crlf_action, > + checksafe); > + ident_to_git(path, dst->buf, dst->len, dst, ca.ident); > +} > + > +static int convert_to_working_tree_internal(const char *path, > + const char *destpath, > + const char *src, > size_t len, struct strbuf *dst, > int normalizing) > { > @@ -907,7 +978,10 @@ static int convert_to_working_tree_internal(const char *path, const char *src, > > convert_attrs(&ca, path); > if (ca.drv) { > - filter = ca.drv->smudge; > + if (destpath) > + filter = ca.drv->smudge_to_file; > + else > + filter = ca.drv->smudge; > required = ca.drv->required; > } > > @@ -918,7 +992,7 @@ static int convert_to_working_tree_internal(const char *path, const char *src, > } > /* > * CRLF conversion can be skipped if normalizing, unless there > - * is a smudge filter. The filter might expect CRLFs. > + * is a filter. The filter might expect CRLFs. > */ > if (filter || !normalizing) { > ret |= crlf_to_worktree(path, src, len, dst, ca.crlf_action); > @@ -928,21 +1002,30 @@ static int convert_to_working_tree_internal(const char *path, const char *src, > } > } > > - ret_filter = apply_filter(path, src, len, -1, dst, filter); > + ret_filter = apply_filter(path, destpath, src, len, -1, dst, filter); > if (!ret_filter && required) > - die("%s: smudge filter %s failed", path, ca.drv->name); > + die("%s: %s filter %s failed", path, destpath ? "smudgeToFile" : "smudge", ca.drv->name); > > return ret | ret_filter; > } > > int convert_to_working_tree(const char *path, const char *src, size_t len, struct strbuf *dst) > { > - return convert_to_working_tree_internal(path, src, len, dst, 0); > + return convert_to_working_tree_internal(path, NULL, src, len, dst, 0); > +} > + > +int convert_to_working_tree_filter_to_file(const char *path, const char *destpath, const char *src, size_t len) > +{ > + struct strbuf output = STRBUF_INIT; > + int ret = convert_to_working_tree_internal(path, destpath, src, len, &output, 0); > + /* The smudgeToFile filter stdout is not used. */ > + strbuf_release(&output); > + return ret; > } > > int renormalize_buffer(const char *path, const char *src, size_t len, struct strbuf *dst) > { > - int ret = convert_to_working_tree_internal(path, src, len, dst, 1); > + int ret = convert_to_working_tree_internal(path, NULL, src, len, dst, 1); > if (ret) { > src = dst->buf; > len = dst->len; > diff --git a/convert.h b/convert.h > index 82871a1..6f46d10 100644 > --- a/convert.h > +++ b/convert.h > @@ -42,6 +42,10 @@ extern int convert_to_git(const char *path, const char *src, size_t len, > struct strbuf *dst, enum safe_crlf checksafe); > extern int convert_to_working_tree(const char *path, const char *src, > size_t len, struct strbuf *dst); > +extern int convert_to_working_tree_filter_to_file(const char *path, > + const char *destpath, > + const char *src, > + size_t len); > extern int renormalize_buffer(const char *path, const char *src, size_t len, > struct strbuf *dst); > static inline int would_convert_to_git(const char *path) > @@ -53,6 +57,12 @@ extern void convert_to_git_filter_fd(const char *path, int fd, > struct strbuf *dst, > enum safe_crlf checksafe); > extern int would_convert_to_git_filter_fd(const char *path); > +/* Precondition: can_clean_from_file(path) == true */ > +extern void convert_to_git_filter_from_file(const char *path, > + struct strbuf *dst, > + enum safe_crlf checksafe); > +extern int can_clean_from_file(const char *path); > +extern int can_smudge_to_file(const char *path); > > /***************************************************************** > * > -- > 2.8.1 > > -- > To unsubscribe from this list: send the line "unsubscribe git" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html