Re: [PATCH 5/6] dir: replace exponential algorithm with a linear one

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Jan 31, 2020 at 9:13 AM SZEDER Gábor <szeder.dev@xxxxxxxxx> wrote:
>
> On Wed, Jan 29, 2020 at 10:03:42PM +0000, Elijah Newren via GitGitGadget wrote:
> > Part of the tension noted above is that the treatment of a directory can
> > changed based on the files within it, and based on the various settings
>
> s/changed/change/, or perhaps s/changed/be changed/ ?
>
> > Since dir.c is somewhat complex, extra cruft built up around this over
> > time.  While trying to unravel it, I noticed several instances where the
> > first call to read_directory_recursive() would return e.g.
> > path_untracked for a some directory and a later one would return e.g.
>
> s/for a some/for some/
>
> > However, on the positive side, it does make the code much faster.  For
> > the following simple shell loop in an empty repository:
> >
> >   for depth in $(seq 10 25)
> >   do
> >     dirs=$(for i in $(seq 1 $depth) ; do printf 'dir/' ; done)
> >     rm -rf dir
> >     mkdir -p $dirs
> >     >$dirs/untracked-file
> >     /usr/bin/time --format="$depth: %e" git status --ignored >/dev/null
> >   done
> >
> > I saw the following timings, in seconds (note that the numbers are a
> > little noisy from run-to-run, but the trend is very clear with every
> > run):
> >
> >     10: 0.03
> >     11: 0.05
> >     12: 0.08
> >     13: 0.19
> >     14: 0.29
> >     15: 0.50
> >     16: 1.05
> >     17: 2.11
> >     18: 4.11
> >     19: 8.60
> >     20: 17.55
> >     21: 33.87
> >     22: 68.71
> >     23: 140.05
> >     24: 274.45
> >     25: 551.15
> >
> > After this fix, those drop to:
> >
> >     10: 0.00
> >     11: 0.00
> >     12: 0.00
> >     13: 0.00
> >     14: 0.00
> >     15: 0.00
> >     16: 0.00
> >     17: 0.00
> >     18: 0.00
> >     19: 0.00
> >     20: 0.00
> >     21: 0.00
> >     22: 0.00
> >     23: 0.00
> >     24: 0.00
> >     25: 0.00
>
> I agree with Derrick here: if you just said that all these report
> 0.00, I would have taken your word for it.

Thanks, I'll include all these fixes.  Good timing too, as I was about
to send a re-roll.

> Having said that...  I don't know how to get more decimal places out
> of /use/bin/time, but our trace performance facility uses nanosecond
> resolution timestamps.  So using this command in the loop above:
>
>   GIT_TRACE_PERFORMANCE=2 git status --ignored 2>&1 >/dev/null |
>     sed -n -e "s/.* performance: \(.*\): git command.*/$depth: \1/p"
>
> gave me this:
>
>   1: 0.000574302 s
>   2: 0.000584995 s
>   3: 0.000608684 s
>   4: 0.000951336 s
>   5: 0.000762019 s
>   6: 0.000816685 s
>   7: 0.000672516 s
>   8: 0.000912628 s
>   9: 0.000661538 s
>   10: 0.000687465 s
>   11: 0.000708880 s
>   12: 0.000693754 s
>   13: 0.000726120 s
>   14: 0.000737334 s
>   15: 0.000787362 s
>   16: 0.000856687 s
>   17: 0.000780892 s
>   18: 0.000790798 s
>   19: 0.000834411 s
>   20: 0.000859094 s
>   21: 0.001230912 s
>   22: 0.001048852 s
>   23: 0.000891057 s
>   24: 0.000934097 s
>   25: 0.001051704 s
>
> Not sure it's worth including, though.

Yeah, I'm afraid people will spend time trying to analyze it and the
numbers are extremely noisy.  I instead included some words about
counting the number of untracked files opened according to strace,
which shows before we had 2^(1+$depth)-2 untracked directories get
opened and after we had exactly $depth get opened.




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux