Re: [PATCH] commit: write out cache-tree information

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Junio C Hamano wrote:
> <trast@xxxxxxxxxxxxxxx> writes:
>
> > From: Thomas Rast <trast@xxxxxxxxxxxxxxx>
> >
> > While write-tree has code to write out the cache-tree information
> > (since we have to compute it anyway if the cache is stale), commit
> > lost this capability when it became a builtin and moved away from
> > using write-tree.
>
> Earlier the code read from the index, made sure that it is not unmerged by
> running cache_tere_update(), before running prepare-commit-msg hook. The
> hook used to see the index that was read in this codepath which is the
> same as what pre-commit left us.
>
> Why run an extra I/O here? The index file could be quite large, and I do
> not want people to writing it out without good reason.

Ok, so let's run some numbers.  With the first test script below I'm
seeing:

  before patch:
    $ time ./commit-in-large-tree.sh
    Initialized empty Git repository in /dev/shm/commit-in-large-tree.tmp/.git/
    6.9M    .git/index

    real    1m31.607s
    user    0m57.604s
    sys     0m29.976s

  after patch: 14% speedup
    $ time ./commit-in-large-tree.sh
    Initialized empty Git repository in /dev/shm/commit-in-large-tree.tmp/.git/
    7.0M    .git/index

    real    1m18.521s
    user    0m53.430s
    sys     0m22.138s

On the other hand if you touch every file as in the second script:

  before patch:
    $ time ./commit-in-large-tree-2.sh 
    Initialized empty Git repository in /dev/shm/commit-in-large-tree.tmp/.git/
    6.9M    .git/index

    real    1m40.910s
    user    0m58.731s
    sys     0m38.011s

  after patch: 5% slowdown
    $ time ./commit-in-large-tree-2.sh 
    Initialized empty Git repository in /dev/shm/commit-in-large-tree.tmp/.git/
    7.0M    .git/index

    real    1m45.465s
    user    1m2.329s
    sys     0m38.849s

I also ran the latter test where it only touches one file in 100
(instead of all 1000) subdirs, and there the patch is still a speedup.

So I guess it depends whether we expect users to mostly modify a small
part or the whole tree.

Regarding your other email

> When we are running a partial commit, the index file you are writing back
> is a temporary index only to build a tree object to record in the commit,
> which we already have done, and the temporary will be discarded.

that's a valid point that I need to address.



-- 8< --   commit-in-large-tree.sh
#!/bin/sh

set -e

git init /dev/shm/commit-in-large-tree.tmp
cd /dev/shm/commit-in-large-tree.tmp
for i in $(seq 1 1000); do
    mkdir $i
    (
	cd $i
	for j in $(seq 1 100); do
	    echo $j > $j
	done
    )
    git add $i
done
git commit -q -m initial
du -h .git/index

for i in $(seq 1 100); do
    echo "$i changed" > $i/$i
    git add $i/$i
    git commit -q -m $i
done

rm -rf /dev/shm/commit-in-large-tree.tmp
-- >8 --

-- 8< --  commit-in-large-tree-2.sh
#!/bin/sh

set -e

git init /dev/shm/commit-in-large-tree.tmp
cd /dev/shm/commit-in-large-tree.tmp
for i in $(seq 1 1000); do
    mkdir $i
    (
	cd $i
	for j in $(seq 1 100); do
	    echo $j > $j
	done
    )
    git add $i
done
git commit -q -m initial
du -h .git/index

for i in $(seq 1 100); do
    for j in $(seq 1 1000); do
	echo "$i changed" > $j/$i
    done
    git add -u
    git commit -q -m $i
done

rm -rf /dev/shm/commit-in-large-tree.tmp
-- >3 --

--
Thomas Rast
trast@{inf,student}.ethz.ch
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]