Thinking a little more, painful as it is to me ;-)...
Perhaps changes to the structure here might make a difference. While this
may mean a change to the application to account for this it may make
searches easier and faster.
person_table
person_id
fname
lname
ssn
other data as req'd
phone_records_table
record_id
person_id
phone_no_type (home, bus, alt)
phone_number
extention
address_table
record_id
person_id
address1
address2 - if req'd
city
state
zip
hash_value (hash on address1, city,state,zip)
The penultimate goal is to reduce the number of columns to search to make it
quicker....
You haven't posted the structure, so this is just a suggestion...
Bastien
From: "Bastien Koert" <bastien_k@xxxxxxxxxxx>
To: tg-php@xxxxxxxxxxxxxxxxxxxxxx, php-db@xxxxxxxxxxxxx
Subject: RE: Finding duplicate contacts..
Date: Tue, 26 Jul 2005 15:07:14 -0400
Obviously searching all the columns is creating a bottleneck, at first
glance I would consider some hashing on the values to make the search
simpler.
If home phone matches any existing home, work or alt phones.
If work phone matches any existing home, work or alt phones.
If alt phone matches any existing home, work or alt phones.
How are the phone numbers stored? As numerics or text? Perhaps combining
the three fields into one and doing some matching on it..
If a concat of last name, street address and zip match any existing. -->
hash this into one value
dunno if that is a help or not, but something to consider
Bastien
From: <tg-php@xxxxxxxxxxxxxxxxxxxxxx>
To: <php-db@xxxxxxxxxxxxx>
Subject: Finding duplicate contacts..
Date: Tue, 26 Jul 2005 14:24:05 -0400
Ok, I'm at a point where I'm just going to throw this out there and see if
anyone has any good solutions because I'm just not seeing one. Of course
there's always a better solution, but I'm too brain dead right now to see
it. So any input would be appreciated.
Also..forgive me if this is more of a MySQL issues.. I'm trying to find
the right combination of MySQL and PHP interaction to make this all work
properly so thought this would be the better forum (versus just the MySQL
lists) and that there would be plenty of people with insight on both to
add their 2 cents.
Basic info:
We have a table with contacts and we need to determine if someone's
already in there but with possibly slightly different info or if the old
data contains duplicates. Current criteria (I'm sure there's better):
If home phone matches any existing home, work or alt phones.
If work phone matches any existing home, work or alt phones.
If alt phone matches any existing home, work or alt phones.
If a concat of last name, street address and zip match any existing.
Table currently contains about 60,000 entries (imported from old system)
and we can expect it to grow.
Indexes are on home, work and alt phones, as well as last name, street
address and zip. Tried it without indexes, it was slow.. tried it with
indexes.. still about the same speed.
Tried seeing if putting the dupe check into a prepared statement sped it
up any. It didn't.
Analyzed table. Same speed.
Out of 60,000 entries, we have roughly 40,000+ unique phone numbers
(that's doing a union on all three phone number columns). So even
pre-compiling just that list and querying another table is going to be
similar in time issues it seems.
For new contacts being brought in, we might have 30-60 at a time (maybe
more) so this query would end up being run on each of the 30-60. At 1.5-2
sec for each query, that ends up being a full minute or two to do the
whole scan.
If I pull the phone #'s and name/address/zip combies into an array in PHP
first, it's only like 12 secs, but each user would end up pulling it (and
maybe store in a session variable or something) and re-pull it
periodically to make sure it's semi-fresh. This sounds like the quicker,
but not the "best" solution.
MySQL version 4.1.11-standard
PHP version 4.3.4
Any tricks, tips or philosophical and/or theoretical things I've just not
seeing, please let me know.
Thanks
-TG
___________________________________________________________
Sent by ePrompter, the premier email notification software.
Free download at http://www.ePrompter.com.
--
PHP Database Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php
--
PHP Database Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php
--
PHP Database Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php