Re: Getting Similar rows from db

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, February 23, 2007 12:35 pm, Wikus Möller wrote:
> I need a script that selects all similar rows from a database. Like
> only the id would differ, but other fields would be the same as other
> rows's fields.
>
> I don't know if there is such a mysql statement like SELECT SIMILAR
> ...
>
> Please assist me with using the right mysql statement or using php to
> get similar rows from the db.

The most common way to do this is to do a "self join" where you
essentially compare TWO copies of the same table with itself:

SELECT a.id, b.*
FROM your_table AS a, your_table AS b
WHERE a.id < b.id
  AND a.field1 = b.field1
  AND a.field2 = b.field2
  AND a.field3 = b.field3
  .
  .
  .

The restriction of a.id < b.id is so you only get notified ONCE about
each "duplicate" -- Take that out, and you get told that A and B are
similar, and that B and A are similar, which is rarely, if ever,
useful.

WARNING:
On a moderate sized table, this can bring your database server to its
knees in a hurry...
E.g., take a couple thousand rows, do a self-join, and you're trying
to process MILLIONS of records.
This usually swaps out to disk, and slogs the whole machine unless
you're on a dedicated server.  At least, that was my experience.

Since this question usually arises after somebody has screwed up and
inserted duplicate records, if you're in that situation, I'd also
highly recommend that you figure out exactly how that happened, and
make sure it can't happen again, because weeding out the duplicates
after the fact is error-prone and gets expensive fast. :-)

To bring this on-topic then, it's sometimes better to:

A) Write a script in PHP to process one record at a time, searching
for duplicates for that record, and then sleep for a second or three.

B) Write a PHP web interface to allow humans to merge duplicates or
similar records, where they already know what the duplicates are from
stumbling across them.  This is probably the ONE INSTANCE where you
actually should violate the rule about not showing the internal ID to
the end user... :-)

-- 
Some people have a "gift" link here.
Know what I want?
I want you to buy a CD from some starving artist.
http://cdbaby.com/browse/from/lynch
Yeah, I get a buck. So?

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php


[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux