On 03/07/2017 01:22 AM, Johann Spies wrote:
Thanks (again Adrian) and Tom.
The situation was that I had a table with 731million records which I
wanted to copy into a partitioned one where there was a unique
constraint on the fields used in my query.
The "backup" table was the single one.
While inserting into the partitioned table from the backup one, several
(about 120000) records failed to insert.
I wanted to find out which records were involved and found that some had
"'" characters in the values which broke some of the functions used to
do some calculations..
As there were fields that might have null values I have tried the "is
not distinct from".
Both sides of the query had primary keys and I did not use group by.
That was why I used "distinct".
Would it not be easier to use a LEFT JOIN between the original
table(backup) and the new table:
https://www.postgresql.org/docs/9.6/static/sql-select.html
join_type
"LEFT OUTER JOIN returns all rows in the qualified Cartesian product
(i.e., all combined rows that pass its join condition), plus one copy of
each row in the left-hand table for which there was no right-hand row
that passed the join condition. This left-hand row is extended to the
full width of the joined table by inserting null values for the
right-hand columns. Note that only the JOIN clause's own condition is
considered while deciding which rows have matches. Outer conditions are
applied afterwards."
So something like:
SELECT
a.pk
FROM
original_table AS a
LEFT JOIN --The OUTER is not required
new_table AS b
ON
a.pk = b.pk
WHERE
b.pk IS NULL
That would show all the rows in the original table that where not
transferred over.
Anyhow in the end, I made some progress with a modified query:
where
s.citing_article = A.citing_article
and
s.cited_article != A.cited_article
and
s.pubyear is distinct from A.pubyear
and
s.year_cited is distinct from A.year_cited
and
s.cited_author is distinct from A.cited_author
and
regexp_replace(s.cited_title, $$'$$, $$''$$,'g') is distinct from
regexp_replace(A.cited_title, $$'$$, $$''$$,'g')
and
regexp_replace(s.cited_work, $$'$$, $$''$$,'g') is distinct
from regexp_replace(A.cited_work, $$'$$, $$''$$,'g')
and
s.doi is distinct from A.doi
Regards.
Johann
--
Because experiencing your loyal love is better than life itself,
my lips will praise you. (Psalm 63:3)
--
Adrian Klaver
adrian.klaver@xxxxxxxxxxx
--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general