Nguyễn Thái Ngọc Duy <pclouds@xxxxxxxxx> writes: > The use case is > > tar -xzf bigproject.tar.gz > cd bigproject > git init > git add . > # git grep or something Two obvious thoughts, and a half. (1) This particular invocation of "git add" can easily detect that it is run in a repository with no $GIT_INDEX_FILE yet, which is the most typical case for a big initial import. It could even ask if the current branch is unborn if you wanted to make the heuristic more specific to this use case. Perhaps it would make sense to automatically plug the bulk import machinery in such a case without an option? (2) Imagine performing a dry-run of update_files_in_cache() using a different diff-files callback that is similar to the update_callback() but that uses the lstat(2) data to see how big an import this really is, instead of calling add_file_to_index(), before actually registering the data to the object database. If you benchmark to see how expensive it is, you may find that such a scheme might be a workable auto-tuning mechanism to trigger this. Even if it were moderately expensive, when combined with the heuristics above for (1), it might be a worthwhile thing to do only when it is likely to be an initial import. (3) Is it always a good idea to send everything to a packfile on a large addition, or are you often better off importing the initial fileset as loose objects? If the latter, then the option name "--bulk" may give users a wrong hint "if you are doing a bulk-import, you are bettern off using this option". This is a very logical extension to what was started at 568508e7 (bulk-checkin: replace fast-import based implementation, 2011-10-28), and I like it. I suspect "--bulk=<threashold>" might be a better alternative than setting the threshold unconditionally to zero, though. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html