Re: [RFC PATCH] Introduce "precious" file concept

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



[CC-ing some of the people involved in recent threads about this]

On Sun, Nov 11 2018, Nguyễn Thái Ngọc Duy wrote:

> Since this topic has come up twice recently, I revisited this
> "precious" thingy that I started four years ago and tried to see if I
> could finally finish it. There are a couple things to be sorted out...
>
> A new attribute "precious" is added to indicate that certain files
> have valuable content and should not be easily discarded even if they
> are ignored or untracked (*).
>
> So far there are two parts of Git that are made aware of precious
> files: "git clean" will leave precious files alone and unpack-trees.c
> (i.e. merges and branch switches) will not overwrite
> ignored-but-precious files.
>
> Is there any other parts of Git that should be made aware of this
> "precious" attribute?
>
> Also while "precious" is a fun name, but it does not sound serious.
> Any suggestions? Perhaps "valuable"?
>
> Very lightly tested. The patch is more to have something to discuss
> than is bug free and ready to use.
>
> (*) Note that tracked files could be marked "precious" in the future
>     too although the exact semantics is not very clear since tracked
>     files are by default precious.
>
>     But something like "index log" could use this to record all
>     changes to precious files instead of just "git add -p" changes,
>     for example. So these files are in a sense more precious than
>     other tracked files.
>
> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@xxxxxxxxx>
> ---
>  Documentation/git-clean.txt     |  3 ++-
>  Documentation/gitattributes.txt | 13 +++++++++++++
>  attr.c                          |  9 +++++++++
>  attr.h                          |  2 ++
>  builtin/clean.c                 | 19 ++++++++++++++++---
>  unpack-trees.c                  |  3 ++-
>  6 files changed, 44 insertions(+), 5 deletions(-)
>
> diff --git a/Documentation/git-clean.txt b/Documentation/git-clean.txt
> index 03056dad0d..a9beadfb12 100644
> --- a/Documentation/git-clean.txt
> +++ b/Documentation/git-clean.txt
> @@ -21,7 +21,8 @@ option is specified, ignored files are also removed. This can, for
>  example, be useful to remove all build products.
>
>  If any optional `<path>...` arguments are given, only those paths
> -are affected.
> +are affected. Ignored or untracked files with `precious` attributes
> +are not removed.
>
>  OPTIONS
>  -------
> diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
> index b8392fc330..c722479bdc 100644
> --- a/Documentation/gitattributes.txt
> +++ b/Documentation/gitattributes.txt
> @@ -1188,6 +1188,19 @@ If this attribute is not set or has an invalid value, the value of the
>  (See linkgit:git-config[1]).
>
>
> +Precious files
> +~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +`precious`
> +^^^^^^^^^^
> +
> +This attribute is set on files to indicate that their content is
> +valuable. Many commands will behave slightly different on precious
> +files. linkgit:git-clean[1] will leave precious files alone. Merging
> +and branch switching will not silently overwrite ignored files that
> +are marked "precious".
> +
> +
>  USING MACRO ATTRIBUTES
>  ----------------------
>
> diff --git a/attr.c b/attr.c
> index 60d284796d..d06ca0ae4b 100644
> --- a/attr.c
> +++ b/attr.c
> @@ -1186,3 +1186,12 @@ void attr_start(void)
>  	pthread_mutex_init(&check_vector.mutex, NULL);
>  #endif
>  }
> +
> +int is_precious_file(struct index_state *istate, const char *path)
> +{
> +	static struct attr_check *check;
> +	if (!check)
> +		check = attr_check_initl("precious", NULL);
> +	git_check_attr(istate, path, check);
> +	return check && ATTR_TRUE(check->items[0].value);
> +}

If we merge two branches is this using the merged post-image of
.gitattributes as a source?

>  	if (o->dir &&
> -	    is_excluded(o->dir, o->src_index, name, &dtype))
> +	    is_excluded(o->dir, o->src_index, name, &dtype) &&
> +	    !is_precious_file(o->src_index, name))
>  		/*
>  		 * ce->name is explicitly excluded, so it is Ok to
>  		 * overwrite it.

I wonder if instead we should just be reverting c81935348b ("Fix
switching to a branch with D/F when current branch has file D.",
2007-03-15), which these days (haven't dug deeply) would just be this,
right?:

>    diff --git a/unpack-trees.c b/unpack-trees.c
    index 7570df481b..b3efaddd4f 100644
    --- a/unpack-trees.c
    +++ b/unpack-trees.c
    @@ -1894,13 +1894,6 @@ static int check_ok_to_remove(const char *name, int len, int dtype,
     	if (ignore_case && icase_exists(o, name, len, st))
     		return 0;

    -	if (o->dir &&
    -	    is_excluded(o->dir, o->src_index, name, &dtype))
    -		/*
    -		 * ce->name is explicitly excluded, so it is Ok to
    -		 * overwrite it.
    -		 */
    -		return 0;
     	if (S_ISDIR(st->st_mode)) {
     		/*
     		 * We are checking out path "foo" and

Something like the approach you're taking will absolutely work from a
technical standpoint, but I fear that it's going to be useless in
practice.

The users who need protection against git deleting their files the most
are exactly the sort of users who aren't expert-level enough to
understand the nuances of how the semantics of .gitignore and "precious"
are going to interact before git eats their data.

This is pretty apparent from the bug reports we're getting about
this. None of them are:

    "Hey, I 100% understood .gitignore semantics including this one part
    of the docs where you say you'll do this, but just forgot one day
    and deleted my work. Can we get some more safety?"

But rather (with some hyperbole for effect):

    "ZOMG git deleted my file! Is this a bug??"

So I think we should have the inverse of this "precious"
attribute". Just a change to the docs to say that .gitignore doesn't
imply these eager deletion semantics on tree unpacking anymore, and if
users want it back they can define a "garbage" attribute
(s/precious/garbage/).

That will lose no data, and in the very rare cases where a checkout of
tracked files would overwrite an ignored pattern, we can just error out
(as we do with the "Ok to overwrite" branch removed) and tell the user
to delete the files to proceed.

Three tests in our test suite fail with that patch applied, and they're
explicitly testing for exactly the sort of scenario where users are likely to lose data. I.e.:

 1. Open a tracked file in an editor
 2. Save it
 3. Switch to a topic branch, that has different .gitignore semantics
    (e.g. let's say a build/ dir exists there)
 4. Have their work deleted

So actually in writing this out I've become convinced that this
"precious" approach can't work either, because *even if* you're an
expert who manages to perfectly define their .gitignore and "precious"
rules in advance to avoid data deletion, those rules will *also* need to
take into account switching between branches (or even different
histories) where you have other sorts of rules.

So really, if there's ambiguity let's just not delete stuff by default
and ask the user to resolve it.



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux