Re: [PATCH] add: add --bulk to index all objects into a pack file

Junio C Hamano <gitster@xxxxxxxxx> · Wed, 02 Oct 2013 23:43:45 -0700

Nguyễn Thái Ngọc Duy  <pclouds@xxxxxxxxx> writes:

> The use case is
>
>     tar -xzf bigproject.tar.gz
>     cd bigproject
>     git init
>     git add .
>     # git grep or something

Two obvious thoughts, and a half.

 (1) This particular invocation of "git add" can easily detect that
     it is run in a repository with no $GIT_INDEX_FILE yet, which is
     the most typical case for a big initial import.  It could even
     ask if the current branch is unborn if you wanted to make the
     heuristic more specific to this use case.  Perhaps it would
     make sense to automatically plug the bulk import machinery in
     such a case without an option?

 (2) Imagine performing a dry-run of update_files_in_cache() using a
     different diff-files callback that is similar to the
     update_callback() but that uses the lstat(2) data to see how
     big an import this really is, instead of calling
     add_file_to_index(), before actually registering the data to
     the object database.  If you benchmark to see how expensive it
     is, you may find that such a scheme might be a workable
     auto-tuning mechanism to trigger this.  Even if it were
     moderately expensive, when combined with the heuristics above
     for (1), it might be a worthwhile thing to do only when it is
     likely to be an initial import.

 (3) Is it always a good idea to send everything to a packfile on a
     large addition, or are you often better off importing the
     initial fileset as loose objects?  If the latter, then the
     option name "--bulk" may give users a wrong hint "if you are
     doing a bulk-import, you are bettern off using this option".

This is a very logical extension to what was started at 568508e7
(bulk-checkin: replace fast-import based implementation,
2011-10-28), and I like it.  I suspect "--bulk=<threashold>" might
be a better alternative than setting the threshold unconditionally
to zero, though.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html