-------------- lilinchao@xxxxxxxxxx >"Li Linchao via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes: > >> From: lilinchao <lilinchao@xxxxxxxxxx> >> >> In some scenarios, users may want more history than the repository >> offered for cloning, which mostly to be a shallow repository, can >> give them. > >Sorry, but I find this hard to understand. Are you saying that most >of the repositories that users try to clone from are shallow? > Oh, sorry, it shoud be "which happens to be a shallow repository". >> But users don't know it is a shallow repository until >> they download it to local, users should have the option to refuse >> to clone this kind of repository, and may want to exit the process >> immediately without creating any unnecessary files. > >This one on the other hand is easy to understand, but we would >probably need something like s/But/But because/. > Ok, I will substitute it. >> Althought there is an option '--depth=x' for users to decide how >> deep history they can fetch, but as the unshallow cloning's depth >> is INFINITY, we can't know exactly the minimun 'x' value that can >> satisfy the minimum integrity, so we can't pass 'x' value to --depth, >> and expect this can obtain a complete history of a repository. > >Hmph, that is an interesting point. This makes me wonder if we can >achieve the same without adding a new option at the UI level (e.g. >by allowing "--depth" to take "infinity" and reject cloning if we >find out that the origin repository is a shallow one). But we can >worry about it later once after we get the machinery driven by the >UI right. > Please let me explain the purpose of this patch in another way: In practice, I may need a filter in clone command to filter out remote shallow repository that may appear. I think a new option like '--reject-shallow', or '--filter-shallow', or something like that, has a clearer purpose for users, they don't have to come up with a specific depth number to achieve the same purpose. A literal filter, or option is just ok, I think. >> In other scenarios, given that we have an API that allow us to import >> external repository, and then perform various operations on the repo. > >Sorry, but I do not understand what you want to say with these two >lines ("Given that X and Y" needs to be followed by a concluding >statement, e.g. "Given that we have API to import and operate, we >can do Z"---you are missing that "we can do Z" part). > Forgive my crappy English :-( What I want to express is "if we have an API to do sth, then do sth". >> diff --git a/Documentation/git-clone.txt b/Documentation/git-clone.txt >> index 02d9c19cec75..af5a97903a05 100644 >> --- a/Documentation/git-clone.txt >> +++ b/Documentation/git-clone.txt >> @@ -15,7 +15,7 @@ SYNOPSIS >> [--dissociate] [--separate-git-dir <git dir>] >> [--depth <depth>] [--[no-]single-branch] [--no-tags] >> [--recurse-submodules[=<pathspec>]] [--[no-]shallow-submodules] >> - [--[no-]remote-submodules] [--jobs <n>] [--sparse] >> + [--[no-]remote-submodules] [--jobs <n>] [--sparse] [--reject-shallow] > >Isn't it "--[no-]reject-shallow"? Offering the negation from the >command line is essential if "[clone] rejectshallow" configuration >is allowed to set the default to true. > Ok, that makes sense. >> +--reject-shallow:: >> + Don't clone a shallow source repository. In some scenarios, clients >> + want the cloned repository information to be complete. Otherwise, >> + the cloning process should end immediately without creating any files, >> + which can save some disk space. This can override `clone.rejectshallow` >> + from the configuration: >> + >> + -------------------------------------------------------------------- >> + $ git -c clone.rejectshallow=false clone --reject-shallow source out >> + -------------------------------------------------------------------- >> + >> + While there is a way to countermand a configured "I always refuse to >> + clone from a shallow repository" with "but let's allow it only this time": >> + >> + ---------------------------------------------------------------------- >> + $ git -c clone.rejectshallow=true clone --no-reject-shallow source out >> + ---------------------------------------------------------------------- > > >This is way too verbose and gives unnecessary details that readers >already know or do not need to know (e.g. setting configuration from >the command line and immediately override it from the command line >is not something end-users would EVER need to do---only test writers >who develop Git would need it). Something like > > Fail if the source repository is a shallow repository. The > `clone.rejectShallow` configuration variable can be used to > give the default. > >would be sufficient. All readers ought to know when a configuration >and command line option exist, the latter can be used to override >the default former gives, and it is *not* a job for the description >of an individual option to teach them to such a detail like the >above does. > >> +static int option_no_shallow = -1; /* unspecified */ >> +static int config_shallow = -1; /* unspecified */ > >Hmph. I would have expected the usual "prepare a single variable >and initialize it to the default, read the config to set it, and >then parse the command line to overwrite it" sequence would suffice >so it is puzzling why we want two separate variables here. > >Let's read on to find out. > >> @@ -90,6 +92,8 @@ static struct option builtin_clone_options[] = { >> OPT__VERBOSITY(&option_verbosity), >> OPT_BOOL(0, "progress", &option_progress, >> N_("force progress reporting")), >> + OPT_BOOL(0, "reject-shallow", &option_no_shallow, >> + N_("don't clone shallow repository")), >> OPT_BOOL('n', "no-checkout", &option_no_checkout, >> N_("don't create a checkout")), >> OPT_BOOL(0, "bare", &option_bare, N_("create a bare repository")), >> @@ -858,6 +862,9 @@ static int git_clone_config(const char *k, const char *v, void *cb) >> free(remote_name); >> remote_name = xstrdup(v); >> } >> + if (!strcmp(k, "clone.rejectshallow")) { >> + config_shallow = git_config_bool(k, v); >> + } >> return git_default_config(k, v, cb); >> } > >You are adding to git_clone_config(), so instead of setting the >value to config_shallow, setting the value to the same variable that >will be used in builtin_clone_options[] array should be sufficient. > >cmd_clone() begins like so: > > > git_config(git_clone_config, NULL); > argc = parse_options(...); > >which means that single variable (let's call it reject_shallow) >can (1) stay to be its initial value if no config or option is >given, (2) if there is config, the git_config() call will cause >that variable assigned, (3) if there is option, parse_options() >call will cause that variable assigned, possibly overwriting the >value taken from the config. > Sorry, you may forget there is a re-read git-config under these lines (around at line 1160): /* * additional config can be injected with -c, make sure it's included * after init_db, which clears the entire config environment. */ write_config(&option_config); /* * re-read config after init_db and write_config to pick up any config * injected by --template and --config, respectively. */ git_config(git_clone_config, NULL); so, what I can think of is introducing a new variable for git_clone_config, and I find that in the other place, "define a new config_xxx variable for git-config" is usual. >Which is exactly what we want. So in short, declare just a single > > static int reject_shallow; /* default to false */ > >instead of "option_no_shallow" and "config_shallow", and use it in >both builtin_clone_options[] given to parse_options, and >git_clone_config() that is given to git_config(), and you'd be fine, >I think. > >> @@ -963,6 +970,7 @@ static int path_exists(const char *path) >> int cmd_clone(int argc, const char **argv, const char *prefix) >> { >> int is_bundle = 0, is_local; >> + int is_shallow = 0; >> const char *repo_name, *repo, *work_tree, *git_dir; >> char *path, *dir, *display_repo = NULL; >> int dest_exists, real_dest_exists = 0; >> @@ -1205,6 +1213,12 @@ int cmd_clone(int argc, const char **argv, const char *prefix) >> >> path = get_repo_path(remote->url[0], &is_bundle); >> is_local = option_local != 0 && path && !is_bundle; >> + >> + /* Detect if the remote repository is shallow */ >> + if (!access(mkpath("%s/shallow", path), F_OK)) { >> + is_shallow = 1; >> + } > >This is only for cloning from a local repository, no? IOW, path at >this point may even be "git://github.com/git/git/" and checking with >access() does not make sense. > >Ah, it is even worse. get_repo_path() can return NULL, so mkpath() >will crash in such a case. This must be at least > > if (path && !access(mkpath("%s/shallow", path), F_OK)) > is_shallow = 1; > >but I think the logic fits better in the body of "if (is_Local)" >thing that immediately follows. It is specific to the case where >cloning from a local repository and access(mkpath()) that is about >the local filesystem (as opposed to going through the transport >layer) belongs there. > >> if (is_local) { >> if (option_depth) >> warning(_("--depth is ignored in local clones; use file:// instead.")); >> @@ -1214,7 +1228,7 @@ int cmd_clone(int argc, const char **argv, const char *prefix) >> warning(_("--shallow-exclude is ignored in local clones; use file:// instead.")); >> if (filter_options.choice) >> warning(_("--filter is ignored in local clones; use file:// instead.")); >> - if (!access(mkpath("%s/shallow", path), F_OK)) { >> + if (is_shallow) { >> if (option_local > 0) >> warning(_("source repository is shallow, ignoring --local")); >> is_local = 0; > >So, I think the above two hunks are making the code worse. If we >are to detect and reject cloning from the shallow repository when >going through the transport layer (i.e. "--no-local" or cloning from >"git://github.com/git/git", or "https://github.com/git/git", if it >were a shallow repository), that must be handled separately. > Sorry, I made the question simple. Reject cloning a shallow repository should apply to all four type transport protocols. There still a bunch of work to be done. >> @@ -1222,6 +1236,25 @@ int cmd_clone(int argc, const char **argv, const char *prefix) >> } >> if (option_local > 0 && !is_local) >> warning(_("--local is ignored")); >> + >> + if (is_shallow) { >> + int reject = 0; >> + >> + /* If option_no_shallow is specified from CLI option, >> + * ignore config_shallow from git_clone_config. >> + */ >> + >> + if (config_shallow != -1) { >> + reject = config_shallow; >> + } >> + if (option_no_shallow != -1) { >> + reject = option_no_shallow; >> + } > >I do not think any of the above is necessary with just a single >reject_shallow variable that is initialized to 0, can be set by >git_config() callback, and can further be set by parse_options(). > >> + if (reject) { >> + die(_("source repository is shallow, reject to clone.")); >> + } > >> + } >> + >> transport->cloning = 1; >> >> transport_set_option(transport, TRANS_OPT_KEEP, "yes"); > >I do not see how this change would allow users to reject cloning >http://github.com/git/git, if that repository were shallow, though. >I think that would need changes to the code that interacts with >these transport_* functions we see later part of this functrion. > >Thanks. On the whole, thank you for all your patience and suggestions! I will dig into the code, and base on your suggestions to figure it out. Finally, I wish you a Happy Lunar New Year! ^-^ -Lilinchao