On 24/11/19 11:48AM, Patrick Steinhardt wrote: > When running scheduled maintenance via `git maintenance start`, we > acquire a lockfile to ensure that no other scheduled maintenance task is > running in the repository concurrently. If so, we do provide an error to > the user hinting that another process seems to be running in this repo. > > There are two important cases why such a lockfile may exist: > > - An actual git-maintenance(1) process is still running in this > repository. > > - An earlier process may have crashed or was interrupted part way > through and has left a stale lockfile behind. > > In c95547a394 (builtin/gc: fix crash when running `git maintenance > start`, 2024-10-10), we have fixed an issue where git-maintenance(1) > would crash with the "start" subcommand, and the underlying bug causes > the second scenario to trigger quite often now. > > Most users don't know how to get out of that situation again though. > Ideally, we'd be removing the stale lock for our users automatically. > But in the context of repository maintenance this is rather risky, as it > can easily run for hours or even days. So finding a clear point where we > know that the old process has exited is basically impossible. Indeed, providing a hint to help affected users here make sense to me. > > We have the same issue in other subsystems, e.g. when locking refs. Our > lockfile interfaces thus provide the `unable_to_lock_message()` function > for exactly this purpose: it provides a nice hint to the user that > explains what is going on and how to get out of that situation again by > manually removing the file. > > Adapt git-maintenance(1) to print a similar hint. While we could use the > above function, we can provide a bit more context as we know exactly > what kind of process would create the lockfile. The added context provided by having a more specific message seems appropriate. Looking at the message provided through `unable_to_lock_message()`, I could see the generic example it provides being a bit confusing here. > > Reported-by: Miguel Rincon Barahona <mrincon@xxxxxxxxxx> > Reported-by: Kev Kloss <kkloss@xxxxxxxxxx> > Signed-off-by: Patrick Steinhardt <ps@xxxxxx> > --- > Hi, > > this issue was reported to me internally at GitLab with the suggestion > of providing a hint for how to get out of the situation again. > > Patrick > --- > builtin/gc.c | 11 ++++++++++- > t/t7900-maintenance.sh | 8 ++++++++ > 2 files changed, 18 insertions(+), 1 deletion(-) > > diff --git a/builtin/gc.c b/builtin/gc.c > index d52735354c9f87ba4e8acb593dd11aa0482223e1..34848626e47c777112994e5b5c558b65952ac8c2 100644 > --- a/builtin/gc.c > +++ b/builtin/gc.c > @@ -2890,8 +2890,17 @@ static int update_background_schedule(const struct maintenance_start_opts *opts, > char *lock_path = xstrfmt("%s/schedule", the_repository->objects->odb->path); > > if (hold_lock_file_for_update(&lk, lock_path, LOCK_NO_DEREF) < 0) { > + if (errno == EEXIST) > + error(_("unable to create '%s.lock': %s.\n\n" > + "Another scheduled git-maintenance(1) process seems to be running in this\n" > + "repository. Please make sure no other maintenance processes are running and\n" > + "then try again. If it still fails, a git-maintenance(1) process may have\n" > + "crashed in this repository earlier: remove the file manually to continue."), > + absolute_path(lock_path), strerror(errno)); > + else > + error_errno(_("cannot acquire lock for scheduled background maintenance")); > free(lock_path); > - return error(_("another process is scheduling background maintenance")); > + return -1; > } > > for (i = 1; i < ARRAY_SIZE(scheduler_fn); i++) { > diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh > index c224c8450c85f567bc29258e18b4a59fe6682f0a..6d6ffaaf376bdbadecdb23a460994af1d218dc19 100755 > --- a/t/t7900-maintenance.sh > +++ b/t/t7900-maintenance.sh > @@ -1011,4 +1011,12 @@ test_expect_success 'repacking loose objects is quiet' ' > ) > ' > > +test_expect_success 'maintenance aborts with existing lock file' ' > + test_when_finished "rm -rf repo" && > + git init repo && > + : >repo/.git/objects/schedule.lock && > + test_must_fail git -C repo maintenance start 2>err && > + test_grep "Another scheduled git-maintenance(1) process seems to be running" err > +' > + > test_done > > --- > base-commit: 090d24e9af6e9f59c3f7bee97c42bb1ae3c7f559 > change-id: 20241119-pks-maintenance-hint-with-stale-lock-35e5417d632b The changes in this patch are straightforward and look good to me :) -Justin