Re: [PATCH] builtin/gc: provide hint when maintenance hits a stale schedule lock

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 24/11/19 11:48AM, Patrick Steinhardt wrote:
> When running scheduled maintenance via `git maintenance start`, we
> acquire a lockfile to ensure that no other scheduled maintenance task is
> running in the repository concurrently. If so, we do provide an error to
> the user hinting that another process seems to be running in this repo.
> 
> There are two important cases why such a lockfile may exist:
> 
>   - An actual git-maintenance(1) process is still running in this
>     repository.
> 
>   - An earlier process may have crashed or was interrupted part way
>     through and has left a stale lockfile behind.
> 
> In c95547a394 (builtin/gc: fix crash when running `git maintenance
> start`, 2024-10-10), we have fixed an issue where git-maintenance(1)
> would crash with the "start" subcommand, and the underlying bug causes
> the second scenario to trigger quite often now.
> 
> Most users don't know how to get out of that situation again though.
> Ideally, we'd be removing the stale lock for our users automatically.
> But in the context of repository maintenance this is rather risky, as it
> can easily run for hours or even days. So finding a clear point where we
> know that the old process has exited is basically impossible.

Indeed, providing a hint to help affected users here make sense to me.

> 
> We have the same issue in other subsystems, e.g. when locking refs. Our
> lockfile interfaces thus provide the `unable_to_lock_message()` function
> for exactly this purpose: it provides a nice hint to the user that
> explains what is going on and how to get out of that situation again by
> manually removing the file.
> 
> Adapt git-maintenance(1) to print a similar hint. While we could use the
> above function, we can provide a bit more context as we know exactly
> what kind of process would create the lockfile.

The added context provided by having a more specific message seems
appropriate. Looking at the message provided through
`unable_to_lock_message()`, I could see the generic example it provides
being a bit confusing here.

> 
> Reported-by: Miguel Rincon Barahona <mrincon@xxxxxxxxxx>
> Reported-by: Kev Kloss <kkloss@xxxxxxxxxx>
> Signed-off-by: Patrick Steinhardt <ps@xxxxxx>
> ---
> Hi,
> 
> this issue was reported to me internally at GitLab with the suggestion
> of providing a hint for how to get out of the situation again.
> 
> Patrick
> ---
>  builtin/gc.c           | 11 ++++++++++-
>  t/t7900-maintenance.sh |  8 ++++++++
>  2 files changed, 18 insertions(+), 1 deletion(-)
> 
> diff --git a/builtin/gc.c b/builtin/gc.c
> index d52735354c9f87ba4e8acb593dd11aa0482223e1..34848626e47c777112994e5b5c558b65952ac8c2 100644
> --- a/builtin/gc.c
> +++ b/builtin/gc.c
> @@ -2890,8 +2890,17 @@ static int update_background_schedule(const struct maintenance_start_opts *opts,
>  	char *lock_path = xstrfmt("%s/schedule", the_repository->objects->odb->path);
>  
>  	if (hold_lock_file_for_update(&lk, lock_path, LOCK_NO_DEREF) < 0) {
> +		if (errno == EEXIST)
> +			error(_("unable to create '%s.lock': %s.\n\n"
> +			    "Another scheduled git-maintenance(1) process seems to be running in this\n"
> +			    "repository. Please make sure no other maintenance processes are running and\n"
> +			    "then try again. If it still fails, a git-maintenance(1) process may have\n"
> +			    "crashed in this repository earlier: remove the file manually to continue."),
> +			    absolute_path(lock_path), strerror(errno));
> +		else
> +			error_errno(_("cannot acquire lock for scheduled background maintenance"));
>  		free(lock_path);
> -		return error(_("another process is scheduling background maintenance"));
> +		return -1;
>  	}
>  
>  	for (i = 1; i < ARRAY_SIZE(scheduler_fn); i++) {
> diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
> index c224c8450c85f567bc29258e18b4a59fe6682f0a..6d6ffaaf376bdbadecdb23a460994af1d218dc19 100755
> --- a/t/t7900-maintenance.sh
> +++ b/t/t7900-maintenance.sh
> @@ -1011,4 +1011,12 @@ test_expect_success 'repacking loose objects is quiet' '
>  	)
>  '
>  
> +test_expect_success 'maintenance aborts with existing lock file' '
> +	test_when_finished "rm -rf repo" &&
> +	git init repo &&
> +	: >repo/.git/objects/schedule.lock &&
> +	test_must_fail git -C repo maintenance start 2>err &&
> +	test_grep "Another scheduled git-maintenance(1) process seems to be running" err
> +'
> +
>  test_done
> 
> ---
> base-commit: 090d24e9af6e9f59c3f7bee97c42bb1ae3c7f559
> change-id: 20241119-pks-maintenance-hint-with-stale-lock-35e5417d632b

The changes in this patch are straightforward and look good to me :)

-Justin




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux