I have seen many examples of just getting rid of duplicate records
all in the same file.
However, I need to know how to append one MySQL table into another
table without
adding duplicate records into the original file accepting the data.
I ran the following script that successfully appended data from what
I called the "tervuren_target"
table into the file that drives the species level of my botanical
wood tree, named
"species_master".
..................................................................................................................................................................................
INSERT INTO species_master (genera, name, species_name, authorities,
species_source)
SELECT genera, name, species_name, authorities, species_source FROM
'tervuren_target'
WHERE tervuren_target.species_name != species_master.species_name;
..................................................................................................................................................................................
The last line was an attempt to avoid carrying over duplicates from
the Tervuren file into the species file
but an error said that it did not recognize
species_master.species_name. So, I tried putting single
quotes as 'species_master.species_name'. The program now ran without
errors and appended records
for a total of 17,685 combined records.
However, it did not take long to realize that what it had done is
interpret 'species_master.species_name' as
a string, not a column! Since none of Tervuren records would have
that string in them, it let all the records to append
including (sigh ...) 3,431 duplicate records!
To clean this up, I realize that if I just use a simple statement
with DISTINCT in it, duplicate records from the original
(and precious) records could get erased instead of the duplicate
record that was appended in from the Tervuren data.
That would be a really bad disaster! I will see what ideas you all
have but I am thinking it might be much safer to do
the append operation over again with a better script that will stop
duplicates from carrying over to the species_master
file in the first case instead of trying to clean up the merged file.
(... After all, that was what I was trying to do anyway).
On structures, here are some facts you will wish to have:
- I first made sure that the size, collation, engine and
field names used (as shown above) were identical in both the
tervuren_target table (the source of data to append) and
the species_master table.
- None of the data in the species_master file should ever be
erased by this append of new data.
- The species_master table has many more fields as the
tervuren_target table but any field for the tervuren_target
data being transferred has exact matching fields in the
species_master table. They are under the same folder.
- The comparison of which records is new is by using the
species_name field as comparison keys in both. If they are
identical in both tables for each new record being
appended, that record should not be allowed in, thereby not allowing
duplicate wood names in. (The meaning of duplicate in
this situation is therefore NOT meant from within a table but
across tables.)
I have no idea if there is a way to use 'Distinct" across tables. How
is my best way to merge in new data from the one file
to expand the records in a master file without letting duplicate data in?
Once the append without duplicates works properly, I then copy it
over as table 'species' which is used as the main source
of data in the species level of the botanical tree.
(http://www.prowebcanada.com/taxa). Expanding the number of woods I can
find and report was the original top level reason for starting this project.
Conversationally:
Once this works for me, it will better than doubled the number of
existing wood species I can report. That will be most impressive
progress for the wood knowledge base I am building. Once I have a
general algorithm working, I have many other lists from which I may
also be able extract valued new data --- far faster than I have been
able to before.
I got kind of close but not quite there. I look forward to your help.
This should be relatively basic to most intermediate and
advanced programmers :-) . I am still a junior programmer in PHP and
MySQL but more things are starting to work for me ;-) .
With thanks any and all help in advance,
Bill Mudry
Mississauga, Ontario