Re: [filter-repo PATCH]: add --callbacks option to load many callbacks from one file

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hey,

Thanks for sending this in.

On Tue, Sep 3, 2024 at 12:43 PM Zack Weinberg <zack@xxxxxxxxxxxx> wrote:
>
> If you are trying to do something complicated with filter-repo,
> you might need to share state among several callbacks, which is
> currently impossible (short of poking values into someone else’s
> namespace) because each callback defined on the command line gets
> its own globals dictionary.

Sharing state is not impossible; it's done in multiple examples in
filter-repo.  If you want to do something complicated with
filter-repo, you can just import it as a library.
contrib/filter-repo-demos includes several more complicated examples,
including complete replacements for git-filter-branch and
bfg-repo-cleaner.  Let's look at how some of them hook up callbacks:

$ git grep 'callback=.*\.' contrib/
contrib/filter-repo-demos/bfg-ish:    self.filter =
fr.RepoFilter(fr_args, commit_callback=self.commit_update)
contrib/filter-repo-demos/clean-ignore:  filter = fr.RepoFilter(args,
commit_callback=checker.skip_ignores)
contrib/filter-repo-demos/filter-lamely:
 commit_callback=self.fixup_commit,
contrib/filter-repo-demos/filter-lamely:
 refname_callback=self.tag_rename,
contrib/filter-repo-demos/filter-lamely:
 tag_callback=self.deref_tags)

The fact that these callbacks point towards an instance method
function makes it clear that these functions will all have access to
the relevant instance ('self' or 'checker' in the examples above),
even when they are different callbacks.  They also have wider access
to whatever globals you might want to define in that file as well;
it's only the simple one-off defined-at-the-command-line callbacks
that were geared towards the simplistic cases that have a separation.

> Or, if you are trying to do something simple but long-winded, such as
> replacing the entire contents of a file, you might want to define long
> multi-line (byte) strings as global variables, to avoid having to deal
> with the undocumented number of spaces inserted at the beginning of
> each line by the callback parser.

Yeah, I can see how the added spaces would be slightly annoying for
the case of multi-line strings (though simple callbacks like
`--name-callback 'return name.replace(b"Wiliam", b"William")'` require
that some kind of leading whitespace be added, and the command line
--*-callback options are targetted towards the simpler usecases, after
all).  However, even in that case you can just use textwrap.dedent.
For example:

git filter-repo --blob-callback '
  import textwrap
  blob.data = bytes(textwrap.dedent("""\
    This is the new
    file that I am
    replacing every blob
    with.  It is great.
    """), "utf-8")
'

And now every file in your repository is replaced by one with the
following contents:
"""
This is the new
file that I am
replacing every blob
with.  It is great.
"""
Notice the lack of leading spaces.

> To facilitate these kinds of uses, add a new command line option
> `--callbacks`.  The argument to this option is a file, which should
> define callback functions, using the same naming convention as is
> described for individual command line callbacks, e.g.
>
>     def name_callback(name):
>         ...
>
> to set the name callback.  Any Python callable is acceptable, e.g.
>
>     class Callbacks:
>         def name_callback(self, name):
>             ...
>
>     callbacks = Callbacks()
>     name_callback = callbacks.name_callback
>
> will also work.  People who know about the undocumented second argument
> to some callbacks may define callbacks that take two arguments.
>
> The callbacks file is loaded as an ordinary Python module; it does _not_
> get any automatic globals, unlike individual command line callbacks.
> However, `import git_format_repo` will work inside the callbacks file,
> even if git_format_repo.py has not been made available in general.

What is this git_format_repo thing you speak of?

Anyway, separate from that and given the above comments I made about
importing git_filter_repo and textwrap.dedent, I'm a little unsure why
this new callbacks command line flag helps.  The usage of
git_filter_repo.py as a library exists for the general case already.
Simple command line flags like `--path` or `--replace-text` exist to
handle the simplest cases.  All the *-callback command line flags
exist to provide a middle ground where the user needs something a bit
more generic and programmatic, but is still targeting a certain piece
of data from the fast-export stream and doesn't want the little extra
verbosity from placing things in a separate file and importing
git_filter_repo and the few lines of glue necessary to hook it up.

In that scheme, this new --callbacks option doesn't seem to "fit" to
me.  It makes the middle ground more generic, but not as generic as
the ability to just import git_filter_repo and do whatever you want.
There's no specific targeting it provides, which makes it feel to me
as there's no specific reason for it to exist separate from the more
generic usecase.

> Tests are added which lightly exercise the new feature, and I have
> also used it myself for a real repo rewrite (of the “simple but
> long-winded” variety).
>
> ----
>
> Also document (briefly) the existing feature of supplying a file
> name rather than an inline function body to --foo-callback options,
> and the availability of an unspecified set of globals to individual
> callbacks (with instruction to see the source code for details).

Thanks for being helpful.  Do note, though, that different logical
changes should be in separate patches.

> This patch introduces uses of the Python standard library modules
> errno, importlib, and inspect.  All functionality used from these
> modules was available in 3.6 or earlier.

Since I recently bumped the minimum required python to 3.6, this is
all good.  Thanks for calling it out.

> This patch introduces several new translatable strings and changes
> one existing translatable string.

Again, thanks for taking the time to call this out.

> ---
>  Documentation/git-filter-repo.txt |  39 ++++++-
>  git-filter-repo                   | 164 +++++++++++++++++++++++-------
>  t/t9391-filter-repo-lib-usage.sh  |   2 +-
>  t/t9392-python-callback.sh        |  57 ++++++++++-
>  4 files changed, 219 insertions(+), 43 deletions(-)

I skimmed over the patch briefly, and it seems to generally be
reasonable, but I didn't look real close since I'm not sure if the
feature "fits".  Does the knowledge about textwrap.dedent or the
ability to import git_filter_repo help you out?



Hope that helps,
Elijah





[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux