Search Postgresql Archives

Re: is (not) distinct from

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 03/07/2017 01:22 AM, Johann Spies wrote:
Thanks (again Adrian) and Tom.

The situation was that I had a table with 731million records which I
wanted to copy into a partitioned one where there was a unique
constraint on the fields used in my query.

The "backup" table was the single one.

While inserting into the partitioned table from the backup one, several
(about 120000) records failed to insert.

I wanted to find out which records were involved and found that some had
"'" characters in the values which broke some of the functions used to
do some calculations..

As there were fields that might have null values I have tried the "is
not distinct from".

Both sides of the query had primary keys and I did not use group by.
That was why I used "distinct".

Would it not be easier to use a LEFT JOIN between the original table(backup) and the new table:

https://www.postgresql.org/docs/9.6/static/sql-select.html

join_type

"LEFT OUTER JOIN returns all rows in the qualified Cartesian product (i.e., all combined rows that pass its join condition), plus one copy of each row in the left-hand table for which there was no right-hand row that passed the join condition. This left-hand row is extended to the full width of the joined table by inserting null values for the right-hand columns. Note that only the JOIN clause's own condition is considered while deciding which rows have matches. Outer conditions are applied afterwards."

So something like:

SELECT
	a.pk
FROM
	original_table AS a
LEFT JOIN                    --The OUTER is not required
	new_table AS b
ON
	a.pk = b.pk
WHERE
	b.pk IS NULL

That would show all the rows in the original table that where not transferred over.


Anyhow in the end, I made some progress with a modified query:

where
s.citing_article = A.citing_article
   and
          s.cited_article !=  A.cited_article
   and
   s.pubyear is  distinct from A.pubyear
   and
   s.year_cited is distinct from A.year_cited
   and
   s.cited_author is distinct from A.cited_author
   and
        regexp_replace(s.cited_title,  $$'$$, $$''$$,'g') is distinct from
        regexp_replace(A.cited_title,  $$'$$, $$''$$,'g')
   and
        regexp_replace(s.cited_work,  $$'$$, $$''$$,'g') is distinct
from regexp_replace(A.cited_work,   $$'$$, $$''$$,'g')
   and
        s.doi is distinct from A.doi

Regards.
Johann

--
Because experiencing your loyal love is better than life itself,
my lips will praise you.  (Psalm 63:3)


--
Adrian Klaver
adrian.klaver@xxxxxxxxxxx


--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux