Re: Re: Fuzzy Array Search

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



It runs fast on my 2.33 core 2, and about as fast on this small data set, on
the dual 6 core with 96GB ram, or the 8 core 9GB box, it depends on the size
of your data set, memory speed and latency, and miniscule amount of
processing power (once again assuming small data set).

That said, you could probably do some clever stuff to minimize the range you
are looking for. For example, you could use the average record size with
imploding the array and searching, capturing the offset, you could
potentially cut out a lot of records that you are, within a certain
probability sure that the result is not in, making your search execute
faster by not even looking in the majority of data in most cases, this would
be interesting to test out actually. You could sort the array to further
narrow down the search by some criteria, what have you. This would all apply
if you are searching very large data sets, i am talking about multiple
billion data points. And all that said, arrays are not really a good
data-structure for searching anyways, that's why they are rarely used in
file systems or as memory data structures ;)

Shawn, == is not good for string comparison, its a bad habit that one should
get out of, use ===, its much safer .

Also try the same algorithm on 100000 arrays of some number of values
10-1000 perhaps, that would give you better performance statistics :)



-- Alex

--
The trouble with programmers is that you can never tell what a programmer is
doing until it’s too late.  ~Seymour Cray



On Tue, Jun 7, 2011 at 5:25 PM, Shawn McKenzie <nospam@xxxxxxxxxxxxx> wrote:

> On 06/07/2011 03:57 PM, Floyd Resler wrote:
> >
> > On Jun 7, 2011, at 4:42 PM, Alex Nikitin wrote:
> >
> >> If you don't need the location, you can implode the array and use preg
> >> match, quickly testing it, that gives you about 4.5 times performance
> >> increase, but it wont give you the location, only if a certain value
> exists
> >> within the array... You can kind of do some really clever math to get
> your
> >> search parameters from there, which would be feasible on really large
> data
> >> sets, but if you want location, you will have to iterate at some
> point...
> >>
> >> (sorry i keep on hitting reply instead of reply to all)
> >>
> >> --
> >> The trouble with programmers is that you can never tell what a
> programmer is
> >> doing until it’s too late.  ~Seymour Cray
> >>
> >>
> >>
> >> On Tue, Jun 7, 2011 at 2:57 PM, Shawn McKenzie <nospam@xxxxxxxxxxxxx>
> wrote:
> >>
> >>> On 06/07/2011 12:45 PM, Floyd Resler wrote:
> >>>> What would be the easiest way to do a fuzzy array search?  Can I do
> this
> >>> without having to step through the array?
> >>>>
> >>>> Thanks!
> >>>> Floyd
> >>>>
> >>>
> >>> I use preg_grep()
> >>>
> >>> --
> >>> Thanks!
> >>> -Shawn
> >>> http://www.spidean.com
> >>>
> >
> > I actually do need the location since I need to get the resulting match.
>  I went ahead and tried to iterate the array and it was MUCH faster than I
> expected it to be!  Of course, considering the machine I'm running this on
> is a monster (2.66 GHz 8 cores, 24GB of RAM) it shouldn't have surprised me!
> >
> > Thanks!
> > Floyd
> >
>
> If you are using a straight equality comparison then the loop would be
> faster (but then array search would probably be better), however if you
> need to use a preg_match() in the loop ("fuzzy search"), then
> preg_grep() will be much faster than the loop.
>
> LOOP WITH PREG_MATCH: 100000
>  0.435957 seconds
> PREG_GREP: 100000
>  0.085604 seconds
>
> LOOP WITH IF ==: 100000
>  0.044594 seconds
> PREG_GREP: 100000
>  0.091519 seconds
>
> --
> Thanks!
> -Shawn
> http://www.spidean.com
>

[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux