Re: Appending one MySQL file to another without duplicates.

Niel Archer <not@xxxxxxxxxx> · Mon, 10 May 2010 07:03:22 +0100

Hi

As this is not a PHP question and in particular not Windows' related
PHP question, I'll keep my reply brief.

If you had a unique index defined, duplicates wouldn't be allowed. Then
you would only have to add IGNORE to the INSERT to prevent warnings
being generated.
http://dev.mysql.com/doc/refman/5.1/en/insert.html

If that is not possible, then a join query to identify the non-duplicate
rows.
http://dev.mysql.com/doc/refman/5.1/en/join.html

> I have seen many examples of just getting rid of duplicate records 
> all in the same file.
> 
> However, I need to know how to append one MySQL table into another 
> table without
> adding duplicate records into the original file accepting the data.
> 
> I ran the following script that successfully appended data from what 
> I called the "tervuren_target"
> table into the file that drives the species level of my botanical 
> wood tree, named
> "species_master".
> ..................................................................................................................................................................................
> INSERT INTO species_master (genera, name, species_name, authorities, 
> species_source)
> 
> SELECT genera, name, species_name, authorities, species_source FROM 
> 'tervuren_target'
> 
> WHERE tervuren_target.species_name != species_master.species_name;
> ..................................................................................................................................................................................
> 
> The last line was an attempt to avoid carrying over duplicates from 
> the Tervuren file into the species file
> but an error said that it did not recognize 
> species_master.species_name. So, I tried putting single
> quotes as 'species_master.species_name'. The program now ran without 
> errors and appended records
> for a total of 17,685 combined records.
> 
> However, it did not take long to realize that what it had done is 
> interpret 'species_master.species_name' as
> a string, not a column! Since none of Tervuren records would have 
> that string in them, it let all the records to append
> including (sigh ...) 3,431 duplicate records!
> 
> To clean this up, I realize that if I just use a simple statement 
> with DISTINCT in it, duplicate records from the original
> (and precious) records could get erased instead of the duplicate 
> record that was appended in from the Tervuren data.
> That would be a really bad disaster! I will see what ideas you all 
> have but I am thinking it might be much safer to do
> the append operation over again with a better script that will stop 
> duplicates from carrying over to the species_master
> file in the first case instead of trying to clean up the merged file. 
> (... After all, that was what I was trying to do anyway).
> 
> On structures, here are some facts you will wish to have:
> 
>          - I first made sure that the size, collation, engine and 
> field names used (as shown above) were identical in both the
>            tervuren_target table (the source of data to append) and 
> the species_master table.
> 
>          - None of the data in the species_master file should ever be 
> erased by this append of new data.
> 
>          - The species_master table has many more fields as the 
> tervuren_target table but any field for the tervuren_target
>             data being transferred has exact matching fields in the 
> species_master table. They are under the same folder.
> 
>          - The comparison of which records is new is by using the 
> species_name field as comparison keys in both. If they are
>             identical in both tables for each new record being 
> appended, that record should not be allowed in, thereby not allowing
>             duplicate wood names in. (The meaning of duplicate in 
> this situation is therefore NOT meant from within a table but
>            across tables.)
> 
> I have no idea if there is a way to use 'Distinct" across tables. How 
> is my best way to merge in new data from the one file
> to expand the records in a master file without letting duplicate data in?
> 
> Once the append without duplicates works properly, I then copy it 
> over as table 'species' which is used as the main source
> of data in the species level of the botanical tree. 
> (http://www.prowebcanada.com/taxa).  Expanding the number of woods I can
> find and report was the original top level reason for starting this project.
> 
> Conversationally:
> Once this works for me, it will better than doubled the number of 
> existing wood species I can report. That will be most impressive 
> progress for the wood knowledge base I am building. Once I have a 
> general algorithm working, I have many other lists from which I may 
> also be able extract valued new data --- far faster than I have been 
> able to before.
> 
> I got kind of close but not quite there. I look forward to your help. 
> This should be relatively basic to most intermediate and
> advanced programmers :-) . I am still a junior programmer in PHP and 
> MySQL but more things are starting to work for me ;-) .
> 
> 
> With thanks any and all help in advance,
> 
> Bill Mudry
> Mississauga, Ontario
> 

--
Niel Archer

-- 
PHP Windows Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php