Re: Git performance results on a large repository

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Feb 10, 2012 at 4:06 AM, Joshua Redstone <joshua.redstone@xxxxxx> wrote:
> Hi Nguyen,
> I like the notion of using --assume-unchanged to cut down the set of
> things that git considers may have changed.
> It seems to me that there may still be situations that require operations
> on the order of the # of files in the repo and hence may still be slow.
> Following is a list of potential candidates that occur to me.
>
> 1. Switching branches, especially if you switch to an old branch.
> Sometimes I've seen branch switching taking a long time for what I thought
> was close to where HEAD was.
>
> 2. Interactive rebase in which you reorder a few commits close to the tip
> of the branch (I observed this taking a long time, but haven't profiled it
> yet).  I include here other types of cherry-picking of commits.
>
> 3. Any working directory operations that fail part-way through and make
> you want to do a 'git reset --hard' or at least a full 'git-status'.  That
> is, when you have reason to believe that files with 'assume-unchange' may
> have accidentally changed.

All these involve unpack_trees(), which is full tree operation. The
bigger your worktree is, the slower it is. Another good reason to
split unrelated parts into separate repositories.


> 4. Operations that require rewriting the index - I think git-add is one?
>
> If the working-tree representation is the full set of all files
> materialized on disk and it's the same as the representation of files
> changed, then I'm not sure how to avoid some of these without playing file
> system games or using wrapper scripts.
>
> What do you (or others) think?
>
>
> Josh
>
>
> On 2/7/12 8:43 AM, "Nguyen Thai Ngoc Duy" <pclouds@xxxxxxxxx> wrote:
>
>>On Mon, Feb 6, 2012 at 10:40 PM, Joey Hess <joey@xxxxxxxxxxx> wrote:
>>>> Someone on HN suggested making assume-unchanged files read-only to
>>>> avoid 90% accidentally changing a file without telling git. When
>>>> assume-unchanged bit is cleared, the file is made read-write again.
>>>
>>> That made me think about using assume-unchanged with git-annex since it
>>> already has read-only files.
>>>
>>> But, here's what seems a misfeature...
>>
>>because, well.. assume-unchanged was designed to avoid stat() and
>>nothing else. We are basing a new feature on top of it.
>>
>>> If an assume-unstaged file has
>>> modifications and I git add it, nothing happens. To stage a change, I
>>> have to explicitly git update-index --no-assume-unchanged and only then
>>> git add, and then I need to remember to reset the assume-unstaged bit
>>> when I'm done working on that file for now. Compare with running git mv
>>> on the same file, which does stage the move despite assume-unstaged. (So
>>> does git rm.)
>>
>>This is normal in the lock-based "checkout/edit/checkin" model. mv/rm
>>operates on directory content, which is not "locked - no edit allowed"
>>(in our case --assume-unchanged) in git. But lock-based model does not
>>map really well to git anyway. It does not have the index (which may
>>make things more complicated). Also at index level, git does not
>>really understand directories.
>>
>>I think we could add a protection layer to index, where any changes
>>(including removal) to an index entry are only allowed if the entry is
>>"unlocked" (i.e no assume-unchanged bit). Locked entries are read-only
>>and have assume-unchanged bit set. "git (un)lock" are introduced as
>>new UI. Does that make assume-unchanged friendlier?
>>--
>>Duy
>>--
>>To unsubscribe from this list: send the line "unsubscribe git" in
>>the body of a message to majordomo@xxxxxxxxxxxxxxx
>>More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]