Appending one MySQL file to another without duplicates.

Bill Mudry <billmudry@xxxxxxxxxx> · Mon, 10 May 2010 01:03:29 -0400

I have seen many examples of just getting rid of duplicate records 
all in the same file.

However, I need to know how to append one MySQL table into another 
table without
adding duplicate records into the original file accepting the data.

I ran the following script that successfully appended data from what 
I called the "tervuren_target"
table into the file that drives the species level of my botanical 
wood tree, named
"species_master".
..................................................................................................................................................................................
INSERT INTO species_master (genera, name, species_name, authorities, 
species_source)

SELECT genera, name, species_name, authorities, species_source FROM 
'tervuren_target'

WHERE tervuren_target.species_name != species_master.species_name;
..................................................................................................................................................................................

The last line was an attempt to avoid carrying over duplicates from 
the Tervuren file into the species file
but an error said that it did not recognize 
species_master.species_name. So, I tried putting single
quotes as 'species_master.species_name'. The program now ran without 
errors and appended records
for a total of 17,685 combined records.

However, it did not take long to realize that what it had done is 
interpret 'species_master.species_name' as
a string, not a column! Since none of Tervuren records would have 
that string in them, it let all the records to append
including (sigh ...) 3,431 duplicate records!

To clean this up, I realize that if I just use a simple statement 
with DISTINCT in it, duplicate records from the original
(and precious) records could get erased instead of the duplicate 
record that was appended in from the Tervuren data.
That would be a really bad disaster! I will see what ideas you all 
have but I am thinking it might be much safer to do
the append operation over again with a better script that will stop 
duplicates from carrying over to the species_master
file in the first case instead of trying to clean up the merged file. 
(... After all, that was what I was trying to do anyway).

On structures, here are some facts you will wish to have:

        - I first made sure that the size, collation, engine and 
field names used (as shown above) were identical in both the
          tervuren_target table (the source of data to append) and 
the species_master table.

        - None of the data in the species_master file should ever be 
erased by this append of new data.

        - The species_master table has many more fields as the 
tervuren_target table but any field for the tervuren_target
           data being transferred has exact matching fields in the 
species_master table. They are under the same folder.

        - The comparison of which records is new is by using the 
species_name field as comparison keys in both. If they are
           identical in both tables for each new record being 
appended, that record should not be allowed in, thereby not allowing
           duplicate wood names in. (The meaning of duplicate in 
this situation is therefore NOT meant from within a table but
          across tables.)

I have no idea if there is a way to use 'Distinct" across tables. How 
is my best way to merge in new data from the one file
to expand the records in a master file without letting duplicate data in?

Once the append without duplicates works properly, I then copy it 
over as table 'species' which is used as the main source
of data in the species level of the botanical tree. 
(http://www.prowebcanada.com/taxa).  Expanding the number of woods I can
find and report was the original top level reason for starting this project.

Conversationally:
Once this works for me, it will better than doubled the number of 
existing wood species I can report. That will be most impressive 
progress for the wood knowledge base I am building. Once I have a 
general algorithm working, I have many other lists from which I may 
also be able extract valued new data --- far faster than I have been 
able to before.

I got kind of close but not quite there. I look forward to your help. 
This should be relatively basic to most intermediate and
advanced programmers :-) . I am still a junior programmer in PHP and 
MySQL but more things are starting to work for me ;-) .

With thanks any and all help in advance,

Bill Mudry
Mississauga, Ontario