Re: problems serving non-bare repos with submodules over http

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 20 Apr 2016, Stefan Beller wrote:
> > I do realize that the situation is quite uncommon, partially I guess due
> > to git submodules mechanism flexibility and power on one hand and
> > under-use (imho) on the other, which leads to discovery of regressions
> > [e.g. 1] and corner cases as mine.

> Thanks for fixing the under-use and reporting bugs. :)

I am thrilled to help ;)

> > [1] http://thread.gmane.org/gmane.comp.version-control.git/288064
> > [2] http://www.onerussian.com/tmp/git-web-submodules.sh

> > My use case:  We are trying to serve a git repository with submodules
> > specified with relative paths over http from a simple web server.  With a demo
> > case and submodule specification [complete script to reproduce including the
> > webserver using python is at 2] such as

> > (git)hopa:/tmp/gitxxmsxYFO[master]git
> > $> tree
> > .
> > ├── f1
> > └── sub1
> >     └── f2

> > $> cat .gitmodules
> > [submodule "sub1"]
> >     path = sub1
> >     url = ./sub1


> > 1. After cloning

> >     git clone http://localhost:8080/.git

> >    I cannot 'submodule update' the sub1 in the clone since its url after
> >    'submodule init' would be  http://localhost:8080/.git/sub1 .  If I manually fix
> >    it up -- it seems to proceed normally since in original repository I have
> >    sub1/.git/ directory and not the "gitlink" for that submodule.

> So the expected URL would be  http://localhost:8080/sub1/.git ?

ATM, yes

> I thought you could leave out the .git prefix, i.e. you can type

>      git clone http://localhost:8080

> and Git will recognize the missing .git and try that as well. The relative URL
> would then be constructed as http://localhost:8080/sub1, which will use the
> same mechanism to find the missing .git ending.

[note1] Unfortunately it is not the case ATM (git version
2.8.1.369.geae769a, output is interspersed with log from the python's simple
http server):

$> git clone http://localhost:8080 xxx                   
Cloning into 'xxx'...             
127.0.0.1 - - [20/Apr/2016 15:01:25] code 404, message File not found
127.0.0.1 - - [20/Apr/2016 15:01:25] "GET /info/refs?service=git-upload-pack HTTP/1.1" 404 -
fatal: repository 'http://localhost:8080/' not found


> > 2. If I serve the clone [2 demos that too] itself, there is no easy remedy at
> >    all since sub1/.git is not a directory but a gitlink.

> Not sure I understand the second question.

If I serve via http a repository where sub1/.git is a "gitlink":

    (git)hopa:/tmp/gitxxmsxYFO_[master]
    $> cat sub1/.git 
    gitdir: ../.git/modules/sub1

Such repository cannot be cloned:

    (git)hopa:/tmp/gitxxmsxYFO_[master]git
    $> git clone http://localhost:8080/sub1 /tmp/xxx
    Cloning into '/tmp/xxx'...                      
    127.0.0.1 - - [20/Apr/2016 15:04:01] code 404, message File not found
    127.0.0.1 - - [20/Apr/2016 15:04:01] "GET /sub1/info/refs?service=git-upload-pack HTTP/1.1" 404 -
    fatal: repository 'http://localhost:8080/sub1/' not found

    $> git clone http://localhost:8080/sub1/.git /tmp/xxx 
    Cloning into '/tmp/xxx'...
    127.0.0.1 - - [20/Apr/2016 15:04:06] code 404, message File not found
    127.0.0.1 - - [20/Apr/2016 15:04:06] "GET /sub1/.git/info/refs?service=git-upload-pack HTTP/1.1" 404 -
    fatal: repository 'http://localhost:8080/sub1/.git/' not found


> > N.B. I haven't approached nested submodules case yet in [2]

> > I wondered

> > a. could 'git clone' (probably actually some relevant helper used by fetch
> >    etc) acquire ability to sense for URL/.git if URL itself doesn't point to a
> >    usable git repository?

> So you mean in case of relative submodules, we need to take the parent
> url, and remove the ".git" at the end and try again if we cannot find
> the submodule?

that would be the a.2 which I have forgotten to outline ;)

in a.  I was suggesting what you have assumed [note 1 above] would be
happening (but doesn't) ATM: that /.git would be automagically sensed.

> >     I think this could provide complete remedy for 1 since then relative urls
> >     would be properly assembled, with similar 'sensing' for /.git for the final urls

> >     I guess we could do it with rewrites/forwards on the "server side",
> >     but it wouldn't be generally acceptable solution.

> > b. is there a better or already existing way to remedy my situation?

> > c. shouldn't "git clone" (or the relevant helper) be aware of remote
> >    /.git possibly being a gitlink file within submodule?

> Oh. I think that non-bare repositories including submodules are not designed
> to be cloned, because they are for use in the file system.

Well -- that is the beauty of git being a distributed VCS, that non-bare repos
seems to be as nicely cloneable as bare ones. And in general it seems to work
with submodules as well, since they should be the "consistent"
philosophically... 

>  Even a local clone fails:

>     # gerrit is a project I know which also has submodules:
>     git clone --recurse-submodules https://gerrit.googlesource.com/gerrit g1
>     git clone --recurse-submodules g1 g2
>     ...
> fatal: clone of '...' into submodule path '...' failed

I guess that is just yet another bug with relative paths in the
submodules.

> So I think for cloning repositories you want to have each repository
> as its own thing (bare or non bare).

in your first line in the example above you somewhat have shown the
counter-argument to the statement.  Indeed each repository should be its own
thing, just possibly registered as a submodule to another one.

> The submodule mechanism is just a way to express a relation between
> the reositories, it's like composing them together, but by that composition
> it breaks the properties of each repository to be easily clonable.

It doesn't really (unless in the cases we both pointed out).  E.g. I can as
easily clone original sub1 repository which was  registered as a submodule of
another one.  Either treatment of them by git during cloning (and placing under
root repo's .git/modules, etc) undermines that feature -- that is the
question we could also discuss here somewhat I guess ;)

> I think we should fix that.

would be awesome! Thanks in advance ;)

> I guess the local clone case is 'easy' as you only need
> to handle the link instead of directory thing correctly.

> For the case you describe (cloning from a remote, whether it is http or ssh),
> we would need to discuss security implications I would assume? It sounds
> scary at first to follow a random git link to the outer space of the repository.

more like "into the inner space".  git already (as  above example shown)
descends right away into  "/info/refs?", so how sensing "/.git/" would be any
different?

> (A similar thing is that you cannot have symlinks in a git repository pointing
> outside of it, IIRC? At least that was fishy.)

that might indeed be dangerous.  but once again, per above argument similarly
up to the "provider" I guess to guarantee protection, e.g. forbidding following
symlink on the webserver for that served directory, if content is not under his
control.

Cheers and thanks for your quick reply Stefan!
-- 
Yaroslav O. Halchenko
Center for Open Neuroscience     http://centerforopenneuroscience.org
Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755
Phone: +1 (603) 646-9834                       Fax: +1 (603) 646-1419
WWW:   http://www.linkedin.com/in/yarik        
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]