Re: Finding duplicate contacts..

Micah Stevens <micah@xxxxxxxxxxxxxxxxxx> · Tue, 26 Jul 2005 12:15:24 -0700

Sounds like it's going awefully slow, I have a table with several million 
entires, and it only takes 10-15 seconds to search through.. on a single 
processor system with 1 gig of ram too.. 

What's the query you're using?

-Micah 

On Tuesday 26 July 2005 11:24 am, tg-php@xxxxxxxxxxxxxxxxxxxxxx wrote:
> Ok, I'm at a point where I'm just going to throw this out there and see if
> anyone has any good solutions because I'm just not seeing one.  Of course
> there's always a better solution, but I'm too brain dead right now to see
> it.  So any input would be appreciated.
>
> Also..forgive me if this is more of a MySQL issues.. I'm trying to find the
> right combination of MySQL and PHP interaction to make this all work
> properly so thought this would be the better forum (versus just the MySQL
> lists) and that there would be plenty of people with insight on both to add
> their 2 cents.
>
> Basic info:
>
> We have a table with contacts and we need to determine if someone's already
> in there but with possibly slightly different info or if the old data
> contains duplicates.  Current criteria (I'm sure there's better):
>
> If home phone matches any existing home, work or alt phones.
> If work phone matches any existing home, work or alt phones.
> If alt phone matches any existing home, work or alt phones.
> If a concat of last name, street address and zip match any existing.
>
> Table currently contains about 60,000 entries (imported from old system)
> and we can expect it to grow.
>
> Indexes are on home, work and alt phones, as well as last name, street
> address and zip.  Tried it without indexes, it was slow.. tried it with
> indexes.. still about the same speed.
>
> Tried seeing if putting the dupe check into a prepared statement sped it up
> any.  It didn't.
>
> Analyzed table.  Same speed.
>
> Out of 60,000 entries, we have roughly 40,000+ unique phone numbers (that's
> doing a union on all three phone number columns).  So even pre-compiling
> just that list and querying another table is going to be similar in time
> issues it seems.
>
> For new contacts being brought in, we might have 30-60 at a time (maybe
> more) so this query would end up being run on each of the 30-60.  At 1.5-2
> sec for each query, that ends up being a full minute or two to do the whole
> scan.
>
> If I pull the phone #'s and name/address/zip combies into an array in PHP
> first, it's only like 12 secs, but each user would end up pulling it (and
> maybe store in a session variable or something) and re-pull it periodically
> to make sure it's semi-fresh.  This sounds like the quicker, but not the
> "best" solution.
>
> MySQL version 4.1.11-standard
>
> PHP version 4.3.4
>
>
> Any tricks, tips or philosophical and/or theoretical things I've just not
> seeing, please let me know.
>
> Thanks
>
> -TG
>
>
> ___________________________________________________________
> Sent by ePrompter, the premier email notification software.
> Free download at http://www.ePrompter.com.

-- 
PHP Database Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php