Still having a hard time transferring records table to table

Bill Mudry <billmudry@xxxxxxxxxx> · Sun, 13 Jun 2010 01:43:34 -0400

Back about May 10, I presented a difficulty I was having trying to 
add in more (distinct) records from one
MySQL table into another that permanently holds the most important 
data that I have. Despite help from
a couple of you in the forum and numerous efforts by me since, 
success on my efforts have been denied
<big sigh!>.

First, some review and introduction:
This is for a knowledge base on all woods around the world. The more 
woods can be reported,
the more successful is the project. I have named the project 'TAXA" 
(..short for taxonomy). Many of the
other features I have wanted to add have been coming on quite well 
but this task is still evading any success.
If you care to directly view what this is about, take a look at 
http://www.prowebcanada.com/taxa/.
In fact, you can escape the frames menu and go right to where species 
only are displayed:
http://www.prowebcanada.com/taxa/alphaspecies.php.
There are 4 major entry points to all the data but you have to be 
concerned with the 'species' level.

The data there is stored in a table named 'species' (ie. - 
species.sql). All tables in the project are in database 'taxa'. The 
more (botanical) names of woods that I can find and add in, the more 
successful is the whole project. So far I am reporting just over 
6,200 different woods (as I say, all in species.sql).

The Thorny Problem
Presently I use all kinds of sources to find new woods. The largest 
list of woods I have is in a sql file
called 'tervuren', with 11,337 wood names. That is WAAYY larger than 
what is in species.sql so there
is a huge interest in copying over these records to species.sql  It 
would be a huge boost that even with
duplicates avoided, that would almost double the number of woods I 
could report!

That is easy said but I have almost being tearing my hair out (... 
and I am almost bald anyway ;-) )
trying to accomplish this. To prepare for this:

        - I protected the integrity of my original files 
(species.sql and tervuren.sql) by copying them
          as working files species_master.sql and 
tervuren_target.sql. It is with these that I will
          (somehow) copy all wood species names that are NOT in the 
species_master table that
          are, however, in the tervuren_target file. (Later after 
species_master is expanded, I can
          rename it back again to 'species').

        - The key column for comparison in both files is called 
'species_name' in both files.

        - I have already treated the tervuren_target table so that 
there are no duplicate records
          within itself with respect to column species_name.

        - There are a couple columns I do not have to carry over to 
the species_master table from
           the tervuren_target table. For the columns that do have 
to, I made sure
           they are named exactly the same in both files and that 
the collation is the same in both.

        - The field names common to both the source for my data (ie. 
tervuren_target) and for
           the destination for the new data (species_master) are:

                genus_name, species_name, authorities_species, source

        - The species_master table has about 18 total fields but 
except for the ones above, none of the
           other fields I have not listed will receive any new data 
whatsoever. They are (so to speak) just
           along for the ... ride. All new data will go into the 4 
fields listed above exclusively.

        - In any attempt to avoid new duplicates forming as data 
comes over into species_master,
           at NO TIME WHATSOEVER can we allow any records in 
species_master to be deleted!
           The reason is because meaningful data in the other fields 
in species_master would be
           lost and data integrity would be violated.

Just adding ALL records from tervuren_ target was easy but it sure 
did not give me the result i need. Both
the tervuren_target table and the species_master table have wood 
names in them that are common to each.
As copying records over is happening, great care  has to be taken to 
avoid NEW duplicates forming via
the combining of records from both tables. In fact, just merging all 
files from tervuren_target over into
species_master created over 3,000 NEW DUPLICATES! The result was an 
unusable mess!

===> .... and that is where all the problem is that has stopped me 
from being successful!

I am hoping that an SQL statement would do all the work fine .... but 
if that will not work, I am also open to
adding any PHP lines for more versatility if needed.

I have made a few column name changes since approaching the group 
back around May 10 but other than that,
I tried out a script tonight suggested to me back then:

SELECT DISTINCT tervuren_target.genus_name, 
tervuren_target.species_name, tervuren_target.authorities_species, 
tervuren_target.source
FROM tervuren_target
LEFT JOIN species_master
ON tervuren_target.species_name = species_master.species_name
LIMIT 0,12000

I had to add in the LIMIT statement or for some reason the server 
would time out every time. The script ran ok
but when I looked at the number of records it created (11,377) that 
was EXACTLY how many records
there are in the original Tervuren file! In other words, there was no 
success in avoiding new duplicates at all
and the data did NOT actually merge into species_master <big sigh!>. 
I also predicted that the DISTINCT
statement would only work on the tervuren_target table (which was 
already cleaned of duplicates within itself)
and I believe I was right. it seems to be extraneous and not the 
needed answer.

.... I had never thought it should be that hard just to add in new 
data to a table of central important without
introducing new duplicate records --- but it is.

Therefore I come in humble request to all of you, hoping someone can 
figure out what actually will work.
The solution is harder than may first seem.  My email address if you 
need it is billmudry at rogers.com

With thanks in advance,

Fingers crossed, toes crossed, eyeballs crossed .......

Bill Mudry
Mississauga, Ontario Canada.