Derrick Stolee <derrickstolee@xxxxxxxxxx> 于2022年8月9日周二 21:37写道: > > On 8/8/2022 12:15 PM, Junio C Hamano wrote: > > "ZheNing Hu via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes: > > > >> From: ZheNing Hu <adlternative@xxxxxxxxx> > >> > >> Although we already had a `--filter=sparse:oid=<oid>` which > >> can used to clone a repository with limited objects which meet > >> filter rules in the file corresponding to the <oid> on the git > >> server. But it can only read filter rules which have been record > >> in the git server before. > > > > Was the reason why we have "we limit to an object we already have" > > restriction because we didn't want to blindly use a piece of > > uncontrolled arbigrary end-user data here? Just wondering. > > One of the ideas here was to limit the opportunity of sending an > arbitrary set of data over the Git protocol and avoid exactly the > scenario you mention. > I find that sparse-checkout uses a "cone" mode to limit the set of send data, which can achieve performance improvement. I don't know if we can use this mode here? With a brief look, it seems that the "cone" mode is ensuring that the filter rule we add is directory and does not contain some special rule '!', '?', '*', '[', ']'. But now if we transport the filter rule to git server, git server cannot check if the filter rule is a directory, because it involves paths in multiple commits. e.g. in 9e6f67, "test" can be a directory, but in e5e154e, "test" can be a file... I don't know how to solve this problem... > Another was that it is incredibly expensive to compute the set of > reachable objects within an arbitrary sparse-checkout definition, > since it requires walking trees (bitmaps do not help here). This > is why (to my knowledge) no Git hosting service currently supports > this mechanism at scale. At minimum, using the stored OID would > allow the host to keep track of these pre-defined sets and do some > precomputing of reachable data using bitmaps to keep clones and > fetches reasonable at all. > > The other side of the issue is that we do not have a good solution > for resolving how to change this filter in the future, in case the > user wants to expand their sparse-checkout definition and update > their partial clone filter. > > There used to be a significant issue where a 'git checkout' > would fault in a lot of missing trees because the index needed to > reference the files outside of the sparse-checkout definition. Now > that the sparse index exists, this is less of an impediment, but > it can still cause some pain. > > At this moment, I think path-scoped filters have a lot of problems > that need solving before they can be used effectively in the wild. > I would prefer that we solve those problems before making the > feature more complicated. That's a tall ask, since these problems > do not have simple solutions. > I have a good idea that if we can let such path-scoped filters work, we can apply sparse-checkout with it... Maybe one day, users can use: git clone --sparse --filter="sparse:buffer=dir" xxx.git to have the repo with sparse-checkout results... Needless to say, this is very tempting. > Thanks, > -Stolee Thanks, ZheNing Hu