Search Postgresql Archives

Re: Group by on %like%

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,

Le 3/07/09 12:53, Dimitri Fontaine a écrit :
Hi,

Le 3 juil. 09 à 11:44, Jennifer Trey a écrit :
I would like to run a query and group several rows based on a phone
number.
However, the same phone number might have a prefix on occasion, example :

name | phone_number
----------------------
james | 123456
james | 00441234556
as you can see, the first 2 James seems to belong together.

What I would do is provide a normalize_phone_number(phone_number text),
such as it returns the same phone number when given a number with or
without international prefix.

Then you
SELECT name, normalize_phone_number(phone_numer)
FROM relation
GROUP BY 1, 2;
[...]

The solution suggested by Dimitri Fontaine and based on a customized function for normalizing phone numbers seems to be a clean one. All the power is contained in the normalize_phone_number() implementation.

The following query may be an alternative solution that does not require any tier function except the classic aggregative ones (COUNT(), SUM()):

SELECT P3.name, P3.phone_number
FROM (
	SELECT P1.name, P1.phone_number, (
		CASE WHEN CHAR_LENGTH(P1.phone_number) >= CHAR_LENGTH(P2.phone_number)
		THEN 1
		ELSE 0
		END
	) AS gec
	FROM (
		SELECT P01.name, P01.phone_number
		FROM pnd AS P01
		GROUP BY P01.name, P01.phone_number
	) AS P1 INNER JOIN (
		SELECT P02.name, P02.phone_number
		FROM pnd AS P02
		GROUP BY P02.name, P02.phone_number
	) AS P2
	ON P1.name = P2.name
	AND (
		CASE WHEN CHAR_LENGTH(P1.phone_number) >= CHAR_LENGTH(P2.phone_number)
		THEN P1.phone_number LIKE ('%'||P2.phone_number)
		ELSE P2.phone_number LIKE ('%'||P1.phone_number)
		END
	)
) AS P3
GROUP BY P3.name, P3.phone_number
HAVING COUNT(*) = SUM(P3.gec)

"pnd" is assumed to be the main table including "name" and "phone_number" columns. "pnd" is directly used as a table source in subqueries aliased P1 and P2 and only for those subqueries.

Assuming the starting values in the table "pnd" as following:

name | phone_number
----------------------
james | 123456
james | 0044123456
james | 555666
sarah | 567890
sarah | 567890

(notice that the phone_number of the 2nd row has been adjusted for similarity to be effective between row 1 and row 2)

The resulting rows from the overall query will be:

name | phone_number
----------------------
james | 0044123456
james | 555666
sarah | 567890

The choice has been made here to keep the longuest phone_number for each set of similar phone_numbers. The shortest could also be kept if desired.

The overall query implies a few subqueries. Subquery aliased P3 is a join between P1 and P2, both corresponding to the same subquery. The difference is in expressing the join conditions: i) on the commun column "name"; and ii) on the likelihood between phone numbers according to the length of these latter. Function CHAR_LENGTH() is used instead of LENGTH() because the first renders the real number of characters whereas the second gives the number of bytes used to encode the argument.

Table P3 is composed of couples (X, Y) of "name" and "phone_number". Each couple is associated to the number "gec" resulting from the counting of phone_numbers Z similar to Y and with CHAR_LENGTH(Y) greater or equal to CHAR_LENGTH(Z).

Eventually only the rows of P3 for which the sum of "gec" is equal to the number of rows of P3 where the value of "phone_number" is the same are kept.

Hoping this alternative solution will help a little (validated with PostgreSQL 8.3.1).

Regards.

P-S: I think this question might also have interested the PgSQL-SQL mailing list and posted there.

--
nha / Lyon / France.

--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux