Re: [PATCH v2] clone: report duplicate entries on case-insensitive filesystems

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Aug 9, 2018 at 2:59 PM Jeff King <peff@xxxxxxxx> wrote:
> On Thu, Aug 09, 2018 at 02:53:42PM -0700, Elijah Newren wrote:
>
> > On Thu, Aug 9, 2018 at 2:44 PM Jeff King <peff@xxxxxxxx> wrote:
> > > > The error message isn't quite as good, but does the user really need
> > > > all the names of the file?  If so, we gave them enough information to
> > > > figure it out, and this is a really unusual case anyway, right?
> > > > Besides, now we're back to linear performance....
> > >
> > > Well, it's still quadratic when they run O(n) iterations of "git
> > > ls-files -s | grep $colliding_oid". You've just pushed the second linear
> > > search onto the user. ;)
> >
> > Wouldn't that be their own fault for not running
> >   git ls-files -s | grep -e $colliding_oid_1 ... -e $colliding_oid_n | sort -k 2
> > ?   ;-)
>
> Man, this thread is the gift that keeps on giving. :)
>
> That's still quadratic, isn't it? You've just hidden the second
> dimension in the single grep call.

It may depend on the implementation within grep.  If I had remembered
to pass -F, and if wikipedia's claims[1] about the Aho-Corasick
algotihrm being the basis of the original Unix command fgrep, and it's
claims that this algorithm is "linear in the length of the strings
plus the length of the searched text plus the number of output
matches", and I didn't just misunderstand something there, then it
looks like it could be linear.

[1] https://en.wikipedia.org/wiki/Aho%E2%80%93Corasick_algorithm

Of course, the grep implementation could also do something stupid and
provide dramatically worse behavior than running the command N times
specifying one pattern each time.[2]

[2] http://savannah.gnu.org/bugs/?16305

> Now since these are all going to be constant strings, in theory an
> intelligent grep could stick them all in a search trie, and match each
> line with complexity k, the length of the matched strings. And since
> k=40, that's technically still linear overall.

Looks like Aho-Corasick uses "a finite-state machine that resembles a
trie", so definitely along those lines.



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux