Hey, Thanks for sending this in. On Tue, Sep 3, 2024 at 12:43 PM Zack Weinberg <zack@xxxxxxxxxxxx> wrote: > > If you are trying to do something complicated with filter-repo, > you might need to share state among several callbacks, which is > currently impossible (short of poking values into someone else’s > namespace) because each callback defined on the command line gets > its own globals dictionary. Sharing state is not impossible; it's done in multiple examples in filter-repo. If you want to do something complicated with filter-repo, you can just import it as a library. contrib/filter-repo-demos includes several more complicated examples, including complete replacements for git-filter-branch and bfg-repo-cleaner. Let's look at how some of them hook up callbacks: $ git grep 'callback=.*\.' contrib/ contrib/filter-repo-demos/bfg-ish: self.filter = fr.RepoFilter(fr_args, commit_callback=self.commit_update) contrib/filter-repo-demos/clean-ignore: filter = fr.RepoFilter(args, commit_callback=checker.skip_ignores) contrib/filter-repo-demos/filter-lamely: commit_callback=self.fixup_commit, contrib/filter-repo-demos/filter-lamely: refname_callback=self.tag_rename, contrib/filter-repo-demos/filter-lamely: tag_callback=self.deref_tags) The fact that these callbacks point towards an instance method function makes it clear that these functions will all have access to the relevant instance ('self' or 'checker' in the examples above), even when they are different callbacks. They also have wider access to whatever globals you might want to define in that file as well; it's only the simple one-off defined-at-the-command-line callbacks that were geared towards the simplistic cases that have a separation. > Or, if you are trying to do something simple but long-winded, such as > replacing the entire contents of a file, you might want to define long > multi-line (byte) strings as global variables, to avoid having to deal > with the undocumented number of spaces inserted at the beginning of > each line by the callback parser. Yeah, I can see how the added spaces would be slightly annoying for the case of multi-line strings (though simple callbacks like `--name-callback 'return name.replace(b"Wiliam", b"William")'` require that some kind of leading whitespace be added, and the command line --*-callback options are targetted towards the simpler usecases, after all). However, even in that case you can just use textwrap.dedent. For example: git filter-repo --blob-callback ' import textwrap blob.data = bytes(textwrap.dedent("""\ This is the new file that I am replacing every blob with. It is great. """), "utf-8") ' And now every file in your repository is replaced by one with the following contents: """ This is the new file that I am replacing every blob with. It is great. """ Notice the lack of leading spaces. > To facilitate these kinds of uses, add a new command line option > `--callbacks`. The argument to this option is a file, which should > define callback functions, using the same naming convention as is > described for individual command line callbacks, e.g. > > def name_callback(name): > ... > > to set the name callback. Any Python callable is acceptable, e.g. > > class Callbacks: > def name_callback(self, name): > ... > > callbacks = Callbacks() > name_callback = callbacks.name_callback > > will also work. People who know about the undocumented second argument > to some callbacks may define callbacks that take two arguments. > > The callbacks file is loaded as an ordinary Python module; it does _not_ > get any automatic globals, unlike individual command line callbacks. > However, `import git_format_repo` will work inside the callbacks file, > even if git_format_repo.py has not been made available in general. What is this git_format_repo thing you speak of? Anyway, separate from that and given the above comments I made about importing git_filter_repo and textwrap.dedent, I'm a little unsure why this new callbacks command line flag helps. The usage of git_filter_repo.py as a library exists for the general case already. Simple command line flags like `--path` or `--replace-text` exist to handle the simplest cases. All the *-callback command line flags exist to provide a middle ground where the user needs something a bit more generic and programmatic, but is still targeting a certain piece of data from the fast-export stream and doesn't want the little extra verbosity from placing things in a separate file and importing git_filter_repo and the few lines of glue necessary to hook it up. In that scheme, this new --callbacks option doesn't seem to "fit" to me. It makes the middle ground more generic, but not as generic as the ability to just import git_filter_repo and do whatever you want. There's no specific targeting it provides, which makes it feel to me as there's no specific reason for it to exist separate from the more generic usecase. > Tests are added which lightly exercise the new feature, and I have > also used it myself for a real repo rewrite (of the “simple but > long-winded” variety). > > ---- > > Also document (briefly) the existing feature of supplying a file > name rather than an inline function body to --foo-callback options, > and the availability of an unspecified set of globals to individual > callbacks (with instruction to see the source code for details). Thanks for being helpful. Do note, though, that different logical changes should be in separate patches. > This patch introduces uses of the Python standard library modules > errno, importlib, and inspect. All functionality used from these > modules was available in 3.6 or earlier. Since I recently bumped the minimum required python to 3.6, this is all good. Thanks for calling it out. > This patch introduces several new translatable strings and changes > one existing translatable string. Again, thanks for taking the time to call this out. > --- > Documentation/git-filter-repo.txt | 39 ++++++- > git-filter-repo | 164 +++++++++++++++++++++++------- > t/t9391-filter-repo-lib-usage.sh | 2 +- > t/t9392-python-callback.sh | 57 ++++++++++- > 4 files changed, 219 insertions(+), 43 deletions(-) I skimmed over the patch briefly, and it seems to generally be reasonable, but I didn't look real close since I'm not sure if the feature "fits". Does the knowledge about textwrap.dedent or the ability to import git_filter_repo help you out? Hope that helps, Elijah