Re: [PATCH v3 0/8] Hiding refs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 02/06/2013 08:17 PM, Junio C Hamano wrote:
> Duy Nguyen <pclouds@xxxxxxxxx> writes:
> 
>> On Tue, Feb 5, 2013 at 5:29 PM, Michael Haggerty <mhagger@xxxxxxxxxxxx> wrote:
>>> Hiderefs creates a "dark" corner of a remote git repo that can hold
>>> arbitrary content that is impossible for anybody to discover but
>>> nevertheless possible for anybody to download (if they know the name of
>>> a hidden reference).  In earlier versions of the patch series I believe
>>> that it was possible to push to a hidden reference hierarchy, which made
>>> it possible to upload dark content.  The new version appears (from the
>>> code) to prohibit adding references in a hidden hierarchy, which would
>>> close the main loophole that I was worried about.  But the documentation
>>> and the unit tests only explicitly say that updates and deletes are
>>> prohibited; nothing is said about adding references (unless "update" is
>>> understood to include "add").  I think the true behavior should be
>>> clarified and tested.
>>>
>>> I was worried that somehow this "dark" content could be used for
>>> malicious purposes; for example, pushing compromised code then
>>> convincing somebody to download it by SHA1 with the implicit argument
>>> "it's safe since it comes directly from the project's official
>>> repository".  If it is indeed impossible to populate the dark namespace
>>> remotely then I can't think of a way to exploit it.
>>
>> Or you can think hiderefs is the first step to addressing the
>> initial ref advertisment problem.  The series says hidden refs are
>> to be fetched out of band, but that's not the only way.
> 
> Let me help unconfuse this thread.
> 
> I think the series as 8-patch series was poorly presented, and
> separating it into two will help understanding what they are about.
> 
> The first three:
> 
>   upload-pack: share more code
>   upload-pack: simplify request validation
>   upload/receive-pack: allow hiding ref hierarchies
> 
> is _the_ topic of the series.  As far as I am concerned (I am not
> speaking for Gerrit users, but am speaking as the Git maintainer),
> the topic is solely about uncluttering.  There may be refs that the
> server end may need to keep for its operation, but that remote users
> have _no_ business knowing about.  Allowing the server to keep these
> refs in the repository, while not showing these refs over the wire,
> is the problem the series solves.
> 
> In other words, it is not about "these are *usually* not wanted by
> clients, so do not show them by default".  It is about "these are
> not to be shown, ever".
> 
> OK?

Yes, the first three patches sound much more reasonable if this is the
goal.  Do you know of users who want the feature defined by the first
three patches, or is it only a stepping stone towards an actually useful
feature?  (I ask because I have trouble imagining a real-world scenario
where these alone would be useful.)

> Now, there may be some refs that are not *usually* wanted by clients
> but there may be cases where clients want to
> 
>  (1) learn about them via the same protocol; and/or
>  (2) fetch them over the protocol.
> 
> If you want to solve both of these two issues generally, the
> solution has to involve a separate protocol from the today's
> protocol.  It would go like this:
[... omitted clear explanation of how delayed advertisement could be
implemented via a new protocol ...]

> But in the meantime, if there is a niche use case where a solution
> to only the second problem is sufficient (and Gerrit and GitHub pull
> requests could both be such use cases), the remainder of the series
> can help, without waiting the solution to solve "usually not wanted
> but may need to be learned" problem.  That is the latter 4 patches
> (the very last one is a demonstration to illustrate why allowing a
> push to hidden ref hierarchy would not and should not work, and is
> not for application):

Given that some people *do* want to fetch all pull requests, is this a
feature that any hosting service would really turn on?  True, the
majority of users would be spared clutter, but at the cost of completely
preventing other users from fetching all pull requests, mirroring the
repository, etc.

In other words, I wonder whether your two incremental steps are useful
at all, in the real world, without yet-to-be-implemented future changes.
 If not, then it doesn't make sense to merge them without at least
imagining the final goal and gaining confidence that they are not false
starts.


I think that a more useful interim solution would be to make it easy to
have two URLs accessing a single git repository, with different levels
of reference visibility applied to each.  This is something that
providers could turn on without sacrificing any existing functionality.
 And it would solve all three problems: clutter, bandwidth, and provenance.

Your first three patches would allow two-tier access to be implemented,
for example by setting GIT_CONFIG or GIT_CONFIG_PARAMETERS or
command-line parameters differently for the processes serving the two
URLs, like:

    git upload-pack ...

vs.

    GIT_CONFIG=config-with-hidden-refs git upload-pack ...
or
    git -c transfer.hiderefs=refs/pull upload-pack ...

But this is a bit awkward because the admin would either have to
maintain two config files, or maintain the hiderefs configuration in the
script starting upload-pack rather than in the configuration file.

Therefore, I suggest a slight change to how hiderefs are configured to
make two-tier URLs easier to configure, such as

    # Define one or more views:
    [view "uncluttered"]
            hiderefs = refs/pull

    # This would set the default view for all services:
    [transfer]
            view = uncluttered

    # Peff also wanted the possibility to configure each service
    # independently which could be done like this:
    [receive]
            view = uncluttered
    [uploadpack]
            view = full

I also tentatively suggest that we add a git-level option "--view" and
an environment variable GIT_VIEW (similar to "--namespace" and
GIT_NAMESPACE) to override the default setting:

    GIT_VIEW=uncluttered git upload-pack ...

This way whoever starts the process only needs to choose a particular
view name; the actual definition would reside in the config file.

I think these changes would make it easier to support two-tier URLs and
would also leave the way open to use the "view" concept for other things
in the future.


I've said my piece now and am gratified that there has been more
discussion about your proposal, which was my main goal.  Therefore FWIW
I turn my -1 into a -0 and leave it up to the people experiencing more
clutter-induced pain to decide how to proceed.

Michael

-- 
Michael Haggerty
mhagger@xxxxxxxxxxxx
http://softwareswirl.blogspot.com/
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]