Re: Question: What's the best way to implement directory permission control in git?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jul 28, 2022 at 9:28 AM Ævar Arnfjörð Bjarmason
<avarab@xxxxxxxxx> wrote:
>
> On Thu, Jul 28 2022, ZheNing Hu wrote:
>
> > Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> 于2022年7月27日周三 17:20写道:
> >>
> >>
> >> On Wed, Jul 27 2022, ZheNing Hu wrote:
> >>
> >> > if there is a monorepo such as
> >> > git@xxxxxxxxxx:derrickstolee/sparse-checkout-example.git
> >> >
> >> > There are many files and directories:
> >> >
> >> > client/
> >> >     android/
> >> >     electron/
> >> >     iOS/
> >> > service/
> >> >     common/
> >> >     identity/
> >> >     list/
> >> >     photos/
> >> > web/
> >> >     browser/
> >> >     editor/
> >> >     friends/
> >> > boostrap.sh
> >> > LICENSE.md
> >> > README.md
> >> >
> >> > Now we can use partial-clone + sparse-checkout to reduce
> >> > the network overhead, and reduce disk storage space size, that's good.
> >> >
> >> > But I also need a ACL to control what directory or file people can fetch/push.
> >> > e.g. I don't want a client fetch the code in "service" or "web".
> >> >
> >> > Now if the user client use "git log -p" or "git sparse-checkout add service"...
> >> > or other git command, git which will  download them by
> >> > "git fetch --filter=blob:none --stdin <oid>" automatically.
> >> >
> >> > This means that the git client and server interact with git objects
> >> > (and don't care about path) we cannot simply ban someone download
> >> > a "path" on the server side.
> >> >
> >> > What should I do? You may recommend me to use submodule,
> >> > but due to its complexity, I don't really want to use it :-(
> >>
> >> There isn't a way to do this in git.
> >>
> >> It's theoretically possible, i.e. a client could be told that the SHA-1
> >> of a directory is XYZ, and construct a commit object with a reference to
> >> it.
> >>
> >
> > I guess you mean use a special reference to hold the restricted path which
> > the client can access, and pre-receive-hook can ban the client from downloading
> > other references. But this method is a little weird... How can this reference
> > sync with main branches? If we have changed client permission to access
> > server directory, how to get the "history" of the server directory?
> >
> > I believe this approach is not very appropriate and is not maintainable.
>
> It's not maintainable at all, and I don't believe any current git client
> supports this.

I agree it's not maintainable and a bad idea.  But I did want to
correct one small thing, and I do have an alternative suggestion at
the end...

> But due to git's commits referring to a Merkle tree I can tell you that
> a subdirectory "secret" has a current tree SHA-1 of XYZ, without giving
> you any of that content.
>
> You *could* then manually construct a commit like:
>
>         tree <NEW_TREE>
>         ...
>
> Where the "<NEW_TREE>" would be a tree like:
>
>         100644 blob <NEW-BLOB-SHA1>     UPDATED.md
>         040000 tree <XYZ>       secret-stuff
>
> And send you a PACK with my new two three new objects (commit, blob &
> new top-level NEW_TREE). To the remote end & protocol it wouldn't be
> distinguishable from a "normal" push.
>
> But nothing supports this already, as a practical matter most of git
> either hard dies if content is missing, or has other odd edge-case
> semantics (and I'm not up-to-date on the state of the art).

Actually, this is what sparse-index (as a sub-option in
sparse-checkout) already basically does.  See
Documentation/technical/sparse-index.txt for details, and note that
we're basically in Phase IV of that document.  In short, the
sparse-index makes it so that common operations based on the index do
not need and do not use information about some subtrees, so if someone
has a partial clone starting with no blobs, they will only have to
download a small subset of the repository blobs in order to handle
most Git operations, and many operations become much faster since the
index is so much smaller.

However:

* Users can run `git sparse-checkout reapply --no-sparse-index` at any
time to force the index to be full again.  This is documented, and
even suggested that users remember in case they attempt to use
external tools (jgit? libgit2? others?) that don't understand sparse
directory entries.  So, removing this ability would be problematic.

* It makes no guarantee whatsoever that the sparse directory entries
are not expanded by less frequently used Git commands.  Notice the
"ensure_full_index()" calls sprinkled throughout the code.  Some have
been removed, one by one, as commands have been modified to better
operate with a sparse index.  The odds they'll all be removed in the
future may well be close to 0%.

* The `ort` merge strategy ignores the index altogether during
operation.  If it needs to walk into a tree to complete a
merge/rebase/revert/cherry-pick/etc., it will.  Further, it doesn't
just look into those paths, it intentionally de-sparsifies paths
involved in conflicts, so it can display it to the user.

* Just because the index is sparse does not mean other commands can't
walk into those directories.  So `git grep` (when given a revision),
`git diff`, `git log`, etc. will look in (old versions of) those
paths.

> Anyway, just saying that for the longer term I'm not aware of an
> *intrinsic* reason for why we couldn't support this sort of thing, in
> case anyone's interested in putting in a *lot* of leg work to make it
> happen.

And on top of the technical leg work required, they would also need to
somehow convince everyone else that it's worth accepting the increased
maintenance effort.  Right now, even if someone had already done the
work to implement it, I'd say it's not worth the maintenance costs.

However, there are two alternative choices I can think of here: You
can use submodules if you want a fixed part of the repository to only
be available to a subset of folks, or use josh
(https://github.com/josh-project/josh) if you need it to be more
dynamic.




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux