Re: [GSoC][PATCH v4 2/7] clone: better handle symlinked files at .git/objects/

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Mar 28 2019, Thomas Gummerer wrote:

> On 03/22, Matheus Tavares wrote:
>> There is currently an odd behaviour when locally clonning a repository
>> with symlinks at .git/objects: using --no-hardlinks all symlinks are
>> dereferenced but without it Git will try to hardlink the files with the
>> link() function, which has an OS-specific behaviour on symlinks. On OSX
>> and NetBSD, it creates a hardlink to the file pointed by the symlink
>> whilst on GNU/Linux, it creates a hardlink to the symlink itself.
>>
>> On Manjaro GNU/Linux:
>>     $ touch a
>>     $ ln -s a b
>>     $ link b c
>>     $ ls -li a b c
>>     155 [...] a
>>     156 [...] b -> a
>>     156 [...] c -> a
>>
>> But on NetBSD:
>>     $ ls -li a b c
>>     2609160 [...] a
>>     2609164 [...] b -> a
>>     2609160 [...] c
>>
>> It's not good to have the result of a local clone to be OS-dependent and
>> since the behaviour on GNU/Linux may result in broken symlinks, let's
>> re-implement it with linkat() instead of link() using a flag to always
>> follow symlinks and make the hardlink be to the pointed file. With this,
>> besides standardizing the behaviour, no broken symlinks will be
>> produced. Also, add tests for symlinked files at .git/objects/.
>>
>> Note: Git won't create symlinks at .git/objects itself, but it's better
>> to handle this case and be friendly with users who manually create them.
>>
>> Signed-off-by: Matheus Tavares <matheus.bernardino@xxxxxx>
>> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx>
>> Co-authored-by: Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx>
>> ---
>>  builtin/clone.c            |  2 +-
>>  t/t5604-clone-reference.sh | 26 +++++++++++++++++++-------
>>  2 files changed, 20 insertions(+), 8 deletions(-)
>>
>> diff --git a/builtin/clone.c b/builtin/clone.c
>> index 50bde99618..b76f33c635 100644
>> --- a/builtin/clone.c
>> +++ b/builtin/clone.c
>> @@ -443,7 +443,7 @@ static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
>>  		if (unlink(dest->buf) && errno != ENOENT)
>>  			die_errno(_("failed to unlink '%s'"), dest->buf);
>>  		if (!option_no_hardlinks) {
>> -			if (!link(src->buf, dest->buf))
>> +			if (!linkat(AT_FDCWD, src->buf, AT_FDCWD, dest->buf, AT_SYMLINK_FOLLOW))
>
> This line is starting to get a bit long, might be worth breaking it up
> to keep to 80 characters per line.
>
> I notice that we are currently not using 'linkat()' anywhere else in
> our codebase.  It looks like it has been introduced in POSIX.1-2008,
> which sounds fairly recent by git's standards.  So I wonder if this is
> really supported on all platforms that git is being built on.
>
> I also wonder what would need to be done on Windows if we were to
> introduce this.  I see we define the 'link()' function in
> 'compat/mingw.c' for that currently, so I guess something similar
> would be needed for 'linkat()'.  I added Dscho to Cc for Windows
> expertise.

For better of worse this particular quest started because I pointed out
(with some WIP patches) that for understanding this change we should
test whatever we did now, to ensure that the refactoring didn't have
unintended side-effects.

But that's a separate question from whether or not we want to keep the
current behavior.

I think the current behavior is clearly insane, so I think we should
change it with some follow-up patches. In particular options like
--dissociate should clearly (in my mind at least) have behavior similar
to "cp -L", and --local should hardlink to the *target* of the symlink,
if anything, at least for objects/{??,pack,info}

I think that changes the portability story with linkat(), since it's not
something we should be planning to keep, just an intermediate step so we
don't have a gigantic patch that both adds tests, refactors and changes
the behavior.

> While I agree with the goal of consistency accross all platforms here,
> I don't know if it's actually worth going through the pain of doing
> that, especially for somewhat of an edge case in local clones.

Note that we explicitly clone everything under objects/, including
recursively cloning unknown directories and their files.

So this is not just say about how we handle symlinks that we don't
expect now (nothing uses them), but if we want to make the promise that
nothing in objects/ will ever use symlinks. Or more specifically, that
if a new version of git starts using it that something doing local
clones might produce a broken copy of such a repo.

Maybe we'll still say "we don't care". Just saying it's a slightly
different question...

> If the test in the previous patch passes on all platforms, I'd be okay
> with just calling the behaviour here undefined, especially as git
> would never actually create symlinks in the .git/objects directory.





[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux