Re: restriction of pulls

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Johannes Schindelin wrote:
Hi,

On Fri, 9 Feb 2007, Rogan Dawes wrote:

Johannes Schindelin wrote:
On Fri, 9 Feb 2007, Christoph Duelli wrote:

Is it possible to restrict a chechout, clone or a later pull to some subdirectory of a repository?
No. In git, a revision really is a revision, and not a group of file revisions.
I thought about how this might be implemented, although I'm not entirely sure how efficient this will be.

There are basically three ways I can think of:

- rewrite the commit objects on the fly. You might want to avoid the use of the pack protocol here (i.e. use HTTP or FTP transport).

- try to teach git a way to ignore certain missing objects and directories. This might be involved, but you could extend upload-pack easily with a new extension for that.

(my favourite:)
- use git-split to create a new branch, which only contains doc/. Do work only on that branch, and merge into mainline from time to time.

If you don't need the history, you don't need to git-split the branch.

You only need to make sure that the newly created branch is _not_ branched off of mainline, since the next merge would _delete_ all files outside of doc/ (merge would see that the files exist in mainline, and existed in the common ancestor, too, so would think that the files were deleted in the doc branch).

Ciao,
Dscho


Your third option sounds quite clever, apart from the problem of attributing a commit and a commit message to someone, when the actual commit doesn't match what they actually did :-(

As well as wondering what happens when they check out a few more files. Do we rewrite those commits as well? What happens if the user has made some commits already? What happens if they have already sent those upstream? etc.

I think the best solution is ultimately to make git able to cope with certain missing objects.

I started writing this in response to another message, but it will do fine here, too:

The description I give here will likely horrify people in terms of communications inefficiency, but I'm sure that can be improved.

Scenario:

A user sees a documentation bug in a git-managed project, and decides that she wants to do something about it. Since she is not on the fastest of connections, she'd like to reduce the checkout to a reasonable minimum, while still working with the git tools.

Viewing the repo layout using gitweb, she sees that all the documentation is stored in the docs/ directory from the root.

So, she creates a local repo to work in:

$ git init-db

She configures her local repo to reference the source one:

(Hypothetical syntax)
$ git clone --reference http://example.com/project.git \
    http://example.com/project.git

Since the reference and repo are the same (and non-local), git doesn't actually download anything, other than the current heads (and maybe tags).

She then does a partial checkout of the master branch, but only the docs/ directory:

$ git checkout -p master docs/

The -p flag indicates that this is a partial checkout of master. Git records that the current HEAD is "master", checks out the docs/ directory, and removes any other files in the working directory (that it knew about from the existing index, if any - I'm not suggesting that it should arbitrarily delete files!)

The checkout process goes as follows: Resolve the <treeish> that HEAD points to, and retrieve it from the upstream repo if it does not exist locally. Continue requesting only the necessary tree and blob objects to satisfy the requested checkout. i.e. From the first tree, identify the docs/ directory. Then request only that tree object. Continue to download tree and blob objects until the entire docs/ directory can be created in the working directory.

This will likely require a new index file format, that simply stores the hashes of objects (blobs or trees) that have not been checked out, as well as the current file's stat information.

Now create a "negative index" (pindex?) that has details about the other files and directories that were not checked out. Obviously, this does not need to recurse into directories that were not checked out. Simply having the hash of the parent directory in the pindex is sufficient information to reconstruct a new index. (This might require a new index format that does not include all known files, but simply stores the hash of the unchecked-out tree or blob.)

Then creating a new commit would require creating the necessary blobs for changed files, new tree objects for trees that change, and a commit object.

As far as I can tell, that could then be pushed/pulled/merged using the existing tools, without any problems.

Rogan
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]