Appending one MySQL file to another without duplicates.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I have seen many examples of just getting rid of duplicate records all in the same file.

However, I need to know how to append one MySQL table into another table without
adding duplicate records into the original file accepting the data.

I ran the following script that successfully appended data from what I called the "tervuren_target" table into the file that drives the species level of my botanical wood tree, named
"species_master".
..................................................................................................................................................................................
INSERT INTO species_master (genera, name, species_name, authorities, species_source)

SELECT genera, name, species_name, authorities, species_source FROM 'tervuren_target'

WHERE tervuren_target.species_name != species_master.species_name;
..................................................................................................................................................................................

The last line was an attempt to avoid carrying over duplicates from the Tervuren file into the species file but an error said that it did not recognize species_master.species_name. So, I tried putting single quotes as 'species_master.species_name'. The program now ran without errors and appended records
for a total of 17,685 combined records.

However, it did not take long to realize that what it had done is interpret 'species_master.species_name' as a string, not a column! Since none of Tervuren records would have that string in them, it let all the records to append
including (sigh ...) 3,431 duplicate records!

To clean this up, I realize that if I just use a simple statement with DISTINCT in it, duplicate records from the original (and precious) records could get erased instead of the duplicate record that was appended in from the Tervuren data. That would be a really bad disaster! I will see what ideas you all have but I am thinking it might be much safer to do the append operation over again with a better script that will stop duplicates from carrying over to the species_master file in the first case instead of trying to clean up the merged file. (... After all, that was what I was trying to do anyway).

On structures, here are some facts you will wish to have:

- I first made sure that the size, collation, engine and field names used (as shown above) were identical in both the tervuren_target table (the source of data to append) and the species_master table.

- None of the data in the species_master file should ever be erased by this append of new data.

- The species_master table has many more fields as the tervuren_target table but any field for the tervuren_target data being transferred has exact matching fields in the species_master table. They are under the same folder.

- The comparison of which records is new is by using the species_name field as comparison keys in both. If they are identical in both tables for each new record being appended, that record should not be allowed in, thereby not allowing duplicate wood names in. (The meaning of duplicate in this situation is therefore NOT meant from within a table but
          across tables.)

I have no idea if there is a way to use 'Distinct" across tables. How is my best way to merge in new data from the one file
to expand the records in a master file without letting duplicate data in?

Once the append without duplicates works properly, I then copy it over as table 'species' which is used as the main source of data in the species level of the botanical tree. (http://www.prowebcanada.com/taxa). Expanding the number of woods I can
find and report was the original top level reason for starting this project.

Conversationally:
Once this works for me, it will better than doubled the number of existing wood species I can report. That will be most impressive progress for the wood knowledge base I am building. Once I have a general algorithm working, I have many other lists from which I may also be able extract valued new data --- far faster than I have been able to before.

I got kind of close but not quite there. I look forward to your help. This should be relatively basic to most intermediate and advanced programmers :-) . I am still a junior programmer in PHP and MySQL but more things are starting to work for me ;-) .


With thanks any and all help in advance,

Bill Mudry
Mississauga, Ontario


[Index of Archives]     [PHP Home]     [PHP Users]     [PHP Database Programming]     [PHP Install]     [Kernel Newbies]     [Yosemite Forum]     [PHP Books]

  Powered by Linux