Re: [GSoC][PATCH 3/3] clone: use dir-iterator to avoid explicit dir traversal

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, Feb 23 2019, Matheus Tavares wrote:

> Replace usage of opendir/readdir/closedir API to traverse directories
> recursively, at copy_or_link_directory function, by the dir-iterator
> API. This simplifies the code and avoid recursive calls to
> copy_or_link_directory.

Sounds good in principle.

> This process also brings some safe behaviour changes to
> copy_or_link_directory:

I ad-hoc tested some of these, and could spot behavior changes. We
should have tests for these.

>  - It will no longer follows symbolic links. This is not a problem,
>    since the function is only used to copy .git/objects directory, and
>    symbolic links are not expected there.

I don't think we should make that assumption, and I don't know of
anything else in git that does.

I've certainly symlinked individual objects or packs into a repo for
debugging / recovery, and it would be unexpected to clone that and miss
something.

So in the general case we should be strict in what we generate, but
permissive in what we accept. We don't want a "clone" of an existing
repo to fail, or "fsck" to fail after clone...

When trying to test this I made e.g. objects/c4 a symlink to /tmp/c4,
and a specific object in objects/4d/ a symlink to /tmp too.

Without this patch the individual object is still a symlink, but the
object under the directory gets resolved, and "un-symlinked", also with
--dissociate, which seems like an existing bug.

With your patch that symlink structure is copied as-is. That's more
faithful under --local, but a regression for --dissociate (which didn't
work fully to begin with...).

I was paranoid that "no longer follows symbolic links" could also mean
"will ignore those objects", but it seems to more faithfully copy things
as-is for *that* case.

But then I try with --no-hardlinks and stock git dereferences my symlink
structure, but with your patches fails completely:

    Cloning into bare repository 'repo2'...
    error: copy-fd: read returned: Is a directory
    fatal: failed to copy file to 'repo2/objects/c4': Is a directory
    fatal: the remote end hung up unexpectedly
    fatal: cannot change to 'repo2': No such file or directory

So there's at least one case in a few minutes of prodding this where we
can't clone a working repo now, however obscure the setup.

>  - Hidden directories won't be skipped anymore. In fact, it is odd that
>    the function currently skip hidden directories but not hidden files.
>    The reason for that could be unintentional: probably the intention
>    was to skip '.' and '..' only, but it ended up accidentally skipping
>    all directories starting with '.'. Again, it must not be a problem
>    not to skip hidden dirs since hidden dirs/files are not expected at
>    .git/objects.

I reproduce this with --local. A ".foo" isn't copied before, now it
is. Good, I guess. We'd have already copied a "foo".

>  - Now, copy_or_link_directory will call die() in case of an error on
>    openddir, readdir or lstat, inside dir_iterator_advance. That means
>    it will abort in case of an error trying to fetch any iteration
>    entry.

Good, but really IMNSHO this series is tweaking some critical core code
and desperately needs tests.

Unfortunately, in this as in so many edge case we have no existing
tests.

This would be much easier to review and would give reviewers more
confidence if the parts of this that changed behavior started with a
patch or patches that just manually objects/ dirs with various
combinations of symlinks, hardlinks etc., and asserted that the various
options did exactly what they're doing now, and made sure the
source/target repos were the same after/both passed "fsck".

Then followed by some version of this patch which changes the behavior,
and would be forced to tweak those tests. To make it clear e.g. that
some cases where we have a working "clone" are now a hard error.

Thanks for working on this!



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux