Re: [RFC PATCH v3 1/5] clone: test for our behavior on odd objects/* content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi, Ævar

First of all, I must apologize for my very late reply. I just got back from a trip and only now have been able to look again at this series.

On Fri, Mar 1, 2019 at 10:49 AM Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> wrote:
>
>
> On Thu, Feb 28 2019, Matheus Tavares Bernardino wrote:
>
> > Hi, Ævar
> >
> > I'm finishing the required changes in this series to send a v4, but
> > when submitting to travis ci, I got some errors on the
> > t5604-clone-reference test:
> > https://travis-ci.org/MatheusBernardino/git/builds/500007587
>
> I don't have access to an OSX box, but could reproduce the failure on
> NetBSD.
>
> It's because there link() when faced with a symlink behaves
> differently. On GNU/Linux link()-ing a symlink will produce another
> symlink like it, on NetBSD (and presumably OSX) doing that will produce
> a hardlink to the file the existing symlink points to.

Hm, interesting. I installed NetBSD here and played with it a little: It seems that the inconsistency comes from the fact that link() follows symlinks on NetBSD but not on Linux. i.e., if you have a file "C" link()-ed to a file "B" which, in turn, is a symlink to a file "A", running "ls -li A B C" we can see that:
- On linux, C points to B's inode
- On NetBSD, C points to A's inode

> I've pushed out a version of mine here which you might want to pull in:
> https://github.com/git/git/compare/master...avar:clone-dir-iterator-3
>
> I.e. this whole thing is silly, but just preserving the notion that
> we're not going to introduce behavior changes as we're refactoring.
>
> So it adds a commit right after the tests I added to detect this case,
> and use symlink() or link() as appropriate instead of link().

I think this still modifies [a little] the current behaviour, since symlink() will make dest->buf point to a new inode (which will be a symlink just as src->buf, but on a different inode) while the current behaviour, on Linux, is to have dest->buf being a hardlink to src->buf (same inode). I don't know if this sentence got too confuse, but what I meant is that symlink() will make a symlink at dest->buf while link(), on linux, will make a hardlink to the given symlink.

> There's then a commit at the end you might want to squash in that
> reproduces this behavior on top of your iterator refactoring.
>
> Of course the DIR_ITERATOR_FOLLOW_SYMLINKS flag at this point is rather
> silly. We're telling it to stat(), and then end up needing both stat()
> and lstat() data.
>
> I'm starting to think that this interface which previously only had one
> caller, but now has two exists at the wrong abstraction level. I.e. it
> itself needs to call lstat(). Seems sensible to always do that and leave
> it to the caller to call stat() if they need, as I believe Duy pointed
> out.

I see what you mean, but if the caller needs to call stat() itself, in the occurrence of a symlink to a directory, it would have to start a new directory iteration upon the symlinked dir and its subdirectories (if it wants to follow symlinks). This approach could became a little messy, IMHO. And just by calling stat() at dir-iterator we already get the symlinked directories iterated "for free", without having to modify anything else in the code. So I still think it is a good idea to have the DIR_ITERATOR_FOLLOW_SYMLINKS flag at dir-iterator, making it call stat() instead of lstat().

> Also noticed that dir-iterator.h still has a comment to the effect
> that it'll call "lstat()", even though we now have a "stat() or
> lstat()?" flag.

Thanks for noticing it, I will fix it in v4.

> > On Tue, Feb 26, 2019 at 9:28 AM Ævar Arnfjörð Bjarmason
> > <avarab@xxxxxxxxx> wrote:
> >>
> >> Add tests for what happens when we locally clone .git/objects
> >> directories where some of the loose objects or packs are symlinked, or
> >> when when there's unknown files there.
> >>
> >> I'm bending over backwards here to avoid a SHA1 dependency. See [1]
> >> for an earlier and simpler version that hardcoded a SHA-1s.
> >>
> >> This behavior has been the same for a *long* time, but hasn't been
> >> tested for.
> >>
> >> There's a good post-hoc argument to be made for copying over unknown
> >> things, e.g. I'd like a git version that doesn't know about the
> >> commit-graph to copy it under "clone --local" so a newer git version
> >> can make use of it.
> >>
> >> But the behavior showed where with symlinks seems pretty
> >> random. E.g. if "pack" is a symlink we end up with two copies of the
> >> contents, and only transfer some symlinks as-is.
> >>
> >> In follow-up commits we'll look at changing some of this behavior, but
> >> for now let's just assert it as-is so we'll notice what we'll change
> >> later.
> >>
> >> 1. https://public-inbox.org/git/20190226002625.13022-5-avarab@xxxxxxxxx/
> >>
> >> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx>
> >> ---
> >> t/t5604-clone-reference.sh | 142 +++++++++++++++++++++++++++++++++++++
> >>  1 file changed, 142 insertions(+)
> >>
> >> diff --git a/t/t5604-clone-reference.sh b/t/t5604-clone-reference.sh
> >> index 4320082b1b..cb0dc22d14 100755
> >> --- a/t/t5604-clone-reference.sh
> >> +++ b/t/t5604-clone-reference.sh
> >> @@ -221,4 +221,146 @@ test_expect_success 'clone, dissociate from alternates' '
> >>         ( cd C && git fsck )
> >>  '
> >>
> >> +test_expect_success 'setup repo with garbage in objects/*' '
> >> +       git init S &&
> >> +       (
> >> +               cd S &&
> >> +               test_commit A &&
> >> +
> >> +               cd .git/objects &&
> >> +               >.some-hidden-file &&
> >> +               >some-file &&
> >> +               mkdir .some-hidden-dir &&
> >> +               >.some-hidden-dir/some-file &&
> >> +               >.some-hidden-dir/.some-dot-file &&
> >> +               mkdir some-dir &&
> >> +               >some-dir/some-file &&
> >> +               >some-dir/.some-dot-file
> >> +       )
> >> +'
> >> +
> >> +test_expect_success 'clone a repo with garbage in objects/*' '
> >> +       for option in --local --no-hardlinks --shared --dissociate
> >> +       do
> >> +               git clone $option S S$option || return 1 &&
> >> +               git -C S$option fsck || return 1
> >> +       done &&
> >> +       find S-* -name "*some*" | sort >actual &&
> >> +       cat >expected <<-EOF &&
> >> +       S--dissociate/.git/objects/.some-hidden-file
> >> +       S--dissociate/.git/objects/some-dir
> >> +       S--dissociate/.git/objects/some-dir/.some-dot-file
> >> +       S--dissociate/.git/objects/some-dir/some-file
> >> +       S--dissociate/.git/objects/some-file
> >> +       S--local/.git/objects/.some-hidden-file
> >> +       S--local/.git/objects/some-dir
> >> +       S--local/.git/objects/some-dir/.some-dot-file
> >> +       S--local/.git/objects/some-dir/some-file
> >> +       S--local/.git/objects/some-file
> >> +       S--no-hardlinks/.git/objects/.some-hidden-file
> >> +       S--no-hardlinks/.git/objects/some-dir
> >> +       S--no-hardlinks/.git/objects/some-dir/.some-dot-file
> >> +       S--no-hardlinks/.git/objects/some-dir/some-file
> >> +       S--no-hardlinks/.git/objects/some-file
> >> +       EOF
> >> +       test_cmp expected actual
> >> +'
> >> +
> >> +test_expect_success SYMLINKS 'setup repo with manually symlinked objects/*' '
> >> +       git init T &&
> >> +       (
> >> +               cd T &&
> >> +               test_commit A &&
> >> +               git gc &&
> >> +               (
> >> +                       cd .git/objects &&
> >> +                       mv pack packs &&
> >> +                       ln -s packs pack
> >> +               ) &&
> >> +               test_commit B &&
> >> +               (
> >> +                       cd .git/objects &&
> >> +                       find ?? -type d >loose-dirs &&
> >> +                       last_loose=$(tail -n 1 loose-dirs) &&
> >> +                       mv $last_loose a-loose-dir &&
> >> +                       ln -s a-loose-dir $last_loose &&
> >> +                       first_loose=$(head -n 1 loose-dirs) &&
> >> +                       (
> >> +                               cd $first_loose &&
> >> +                               obj=$(ls *) &&
> >> +                               mv $obj ../an-object &&
> >> +                               ln -s ../an-object $obj
> >> +                       ) &&
> >> + find . -type f | sort >../../../T.objects-files.raw && > >> + find . -type l | sort >../../../T.objects-links.raw
> >> +               )
> >> +       ) &&
> >> +       git -C T fsck &&
> >> +       git -C T rev-list --all --objects >T.objects
> >> +'
> >> +
> >> +
> >> +test_expect_success SYMLINKS 'clone repo with symlinked objects/*' '
> >> +       for option in --local --no-hardlinks --shared --dissociate
> >> +       do
> >> +               git clone $option T T$option || return 1 &&
> >> +               git -C T$option fsck || return 1 &&
> >> + git -C T$option rev-list --all --objects >T$option.objects &&
> >> +               test_cmp T.objects T$option.objects &&
> >> +               (
> >> +                       cd T$option/.git/objects &&
> >> + find . -type f | sort >../../../T$option.objects-files.raw && > >> + find . -type l | sort >../../../T$option.objects-links.raw
> >> +               )
> >> +       done &&
> >> +
> >> +       for raw in $(ls T*.raw)
> >> +       do
> >> + sed -e "s!/..\$!/X!; s!/../!/Y/!; s![0-9a-f]\{38,\}!Z!" <$raw >$raw.de-sha || return 1
> >> +       done &&
> >> +
> >> +       cat >expected-files <<-EOF &&
> >> +       ./Y/Z
> >> +       ./a-loose-dir/Z
> >> +       ./an-object
> >> +       ./Y/Z
> >> +       ./info/packs
> >> +       ./loose-dirs
> >> +       ./pack/pack-Z.idx
> >> +       ./pack/pack-Z.pack
> >> +       ./packs/pack-Z.idx
> >> +       ./packs/pack-Z.pack
> >> +       EOF
> >> +       cat >expected-links <<-EOF &&
> >> +       ./Y/Z
> >> +       EOF
> >> +       for option in --local --dissociate
> >> +       do
> >> + test_cmp expected-files T$option.objects-files.raw.de-sha || return 1 && > >> + test_cmp expected-links T$option.objects-links.raw.de-sha || return 1
> >> +       done &&
> >> +
> >> +       cat >expected-files <<-EOF &&
> >> +       ./Y/Z
> >> +       ./Y/Z
> >> +       ./a-loose-dir/Z
> >> +       ./an-object
> >> +       ./Y/Z
> >> +       ./info/packs
> >> +       ./loose-dirs
> >> +       ./pack/pack-Z.idx
> >> +       ./pack/pack-Z.pack
> >> +       ./packs/pack-Z.idx
> >> +       ./packs/pack-Z.pack
> >> +       EOF
> >> + test_cmp expected-files T--no-hardlinks.objects-files.raw.de-sha &&
> >> +       test_must_be_empty T--no-hardlinks.objects-links.raw.de-sha &&
> >> +
> >> +       cat >expected-files <<-EOF &&
> >> +       ./info/alternates
> >> +       EOF
> >> +       test_cmp expected-files T--shared.objects-files.raw &&
> >> +       test_must_be_empty T--shared.objects-links.raw
> >> +'
> >> +
> >>  test_done
> >> --
> >> 2.21.0.rc2.261.ga7da99ff1b
> >>



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux