Still cannot append one table to another without duplicates following.

Bill Mudry <billmudry@xxxxxxxxxx> · Sun, 16 May 2010 01:21:55 -0400

I thank Tyrael and Neil for their prior efforts a few days ago. 
Unfortunately their
suggestions did not work. I have searched the web and tried many 
different combinations
but I am surprised that it is that hard to add from one table of 
about 11,300 records into
a master file I use with about 6,550 relevant records while keeping 
duplicates in the larger
file to what is already in the smaller one stopped.

This has made me think that first, perhaps I have to get back to 
understanding some of the
basics of how various SQL commands and statements work and also how they do NOT
work.

Please help me confirm as true what I will say below or else correct 
my understanding.

ISSUE 1

DISTINCT - This is commonly used to get rid of duplicates in ONE 
specific file. All records
        being found to be duplicate is always in the ONE table and 
duplicates are never
defined as duplicate to another record in another table. I have never 
seen any example or
reference in which it can be used to prevent new files coming into a 
table to be a
duplicate against all records already in a table. Is this not true? 
Correct me if I am wrong.

This massive addition of records to the master table is the scenario 
I am trying to do
while not introducing new duplicates common to both. What is in the 
table accepting new
records must not have any records erased in an effort to not have 
duplicates. It just has too
much additional data worth preserving.

The file to be appended to (called species_master) has the exact 4 
same column names
as in the first table with the 11,300 records (which I named 
tervuren_target) but with many
more fields that would tag along for the ride during a merge of the 
two files.

The tervuren_target file has already been cleaned of itself of 
duplicate records. Therefore,
if my comment on using DISTINCT is true  ..... and cannot be used to 
prevent new
duplicates from forming when records are transferred from one table 
to another, then in the
append or merge I am trying for ..... using 'DISTINCT' to stop new 
duplicates would be a
waste of time! True or False?

ISSUE 2
When using a JOIN, we usually equate a foreign key in one table to a 
primary key of the
table being joined to. However, in my need, I am not talking about a 
relational connection
between the two tables. I had to first merge the genus and epithet 
columns together in the
tervuren_target table to create one-column species names in column 
species_name.

Once that was done, all the columns I want to copy over into the 
master species file
were deliberately made the same as in the species_master file. There 
is no hierarchy
between them --- only more of the same kind of data coming in from a 
HUGE source.

SO ...... on a Join (or any other efforts to bring over the data 
WITHOUT duplicates)
as said above, it is normally written written as table1 = table2. In 
this case, the
comparison would be .....
        tervuren_target.species_name = species_master.species_name.

BUT wait! That would allow all records with a species_name equal to one in the
master file to come flying in, right?? Wow, that is the very OPPOSITE 
OF WHAT I WANT!
Unless there is something I do not know yet ---> that seems like it 
would let nothing BUT
duplicates in!! ..... Totally and absolutely not acceptable if that 
would happen.

Therefore, with joins, is it only the equal operator that can be 
used? I am leaning to think
that the following is not allowed even if it spells out the condition 
that I really need:
        tervuren_target.species_name  <> species_master.species_name.

So please do correct me if my understanding is false or inadequate. 
For now, assuming
I am correct, then just how does one specify how to add further 
records into a table
from another table with parallel fields while stopping records coming 
over that would
create duplicate data (in this case duplicates in the species_name field?

Because of the close similarity of structures of the two tables and 
because of data
merging into the same key column to create duplicates (guaranteed 
---- over 3,000 of
them!) I am now wondering if using a JOIN can be used, should be used 
or is relevant
at all. So I still need to find how to merge over all those tervuren 
records without
creating NEW duplicates in the process. First, though, there is one more issue.

ISSUE 3

The script that was suggested to me is:

INSERT INTO species_master (genera, name, species_name, authorities,
species_source)

SELECT DISTINCT genera, name, species_name, authorities, species_source FROM
'tervuren_target'
LEFT JOIN species_master
ON
tervuren_target.species_name = species_master.species_name
WHERE species_master.species_name IS NULL

My next issue is about the last line:
WHERE species_master.species_name IS NULL
        I believe I was told this was to help prevent duplicates --- 
but I cannot see that. Here is my understanding
        why. Perhaps I do not know enough details about how that 
WHERE statement is parsed and made to work.
Before the tables are merged, all records in 
species_master.species_name have species names in
them. That column just does not have ANY NULL entries! Therefore ---- 
am I not correct to say that there is
absolutely no conditions to satisfy the WHERE statement? Nothing can happen!

One other thing just came to mind. In all my efforts in this, as I 
ran different script attempts, I would frequently
realize I had to state fields as "table.column" only to find that the 
system would say it does not know that column.
Strange thought I. Why would it do that? Did I still do something 
wrong or incomplete?

One last word before I am told I have been off topic. My entire 
project is done in PHP with storage in MySQL.
I was hoping that doing this above merge of data in PHP might be 
overkill if SQL can handle it. I am still
quite open to a PHP script solution if that is better or if SQL just 
cannot cut it.

So ...... <mild sigh> ...... what really will do the job?

Always thankful for help,

Bill Mudry
Mississauga, Ontario