Re: Still having a hard time transferring records table to table

Niel Archer <not@xxxxxxxxxx> · Sun, 13 Jun 2010 15:10:35 +0100

> Back about May 10, I presented a difficulty I was having trying to 
> add in more (distinct) records from one
> MySQL table into another that permanently holds the most important 
> data that I have. Despite help from
> a couple of you in the forum and numerous efforts by me since, 
> success on my efforts have been denied
> <big sigh!>.
> 
> First, some review and introduction:
> This is for a knowledge base on all woods around the world. The more 
> woods can be reported,
> the more successful is the project. I have named the project 'TAXA" 
> (..short for taxonomy). Many of the
> other features I have wanted to add have been coming on quite well 
> but this task is still evading any success.
> If you care to directly view what this is about, take a look at 
> http://www.prowebcanada.com/taxa/.
> In fact, you can escape the frames menu and go right to where species 
> only are displayed:
> http://www.prowebcanada.com/taxa/alphaspecies.php.
> There are 4 major entry points to all the data but you have to be 
> concerned with the 'species' level.
> 
> The data there is stored in a table named 'species' (ie. - 
> species.sql). All tables in the project are in database 'taxa'. The 
> more (botanical) names of woods that I can find and add in, the more 
> successful is the whole project. So far I am reporting just over 
> 6,200 different woods (as I say, all in species.sql).
> 
> The Thorny Problem
> Presently I use all kinds of sources to find new woods. The largest 
> list of woods I have is in a sql file
> called 'tervuren', with 11,337 wood names. That is WAAYY larger than 
> what is in species.sql so there
> is a huge interest in copying over these records to species.sql  It 
> would be a huge boost that even with
> duplicates avoided, that would almost double the number of woods I 
> could report!
> 
> That is easy said but I have almost being tearing my hair out (... 
> and I am almost bald anyway ;-) )
> trying to accomplish this. To prepare for this:
> 
>          - I protected the integrity of my original files 
> (species.sql and tervuren.sql) by copying them
>            as working files species_master.sql and 
> tervuren_target.sql. It is with these that I will
>            (somehow) copy all wood species names that are NOT in the 
> species_master table that
>            are, however, in the tervuren_target file. (Later after 
> species_master is expanded, I can
>            rename it back again to 'species').
> 
>          - The key column for comparison in both files is called 
> 'species_name' in both files.
> 
>          - I have already treated the tervuren_target table so that 
> there are no duplicate records
>            within itself with respect to column species_name.
> 
>          - There are a couple columns I do not have to carry over to 
> the species_master table from
>             the tervuren_target table. For the columns that do have 
> to, I made sure
>             they are named exactly the same in both files and that 
> the collation is the same in both.
> 
>          - The field names common to both the source for my data (ie. 
> tervuren_target) and for
>             the destination for the new data (species_master) are:
> 
>                  genus_name, species_name, authorities_species, source
> 
>          - The species_master table has about 18 total fields but 
> except for the ones above, none of the
>             other fields I have not listed will receive any new data 
> whatsoever. They are (so to speak) just
>             along for the ... ride. All new data will go into the 4 
> fields listed above exclusively.
> 
>          - In any attempt to avoid new duplicates forming as data 
> comes over into species_master,
>             at NO TIME WHATSOEVER can we allow any records in 
> species_master to be deleted!
>             The reason is because meaningful data in the other fields 
> in species_master would be
>             lost and data integrity would be violated.
> 
> Just adding ALL records from tervuren_ target was easy but it sure 
> did not give me the result i need. Both
> the tervuren_target table and the species_master table have wood 
> names in them that are common to each.
> As copying records over is happening, great care  has to be taken to 
> avoid NEW duplicates forming via
> the combining of records from both tables. In fact, just merging all 
> files from tervuren_target over into
> species_master created over 3,000 NEW DUPLICATES! The result was an 
> unusable mess!
> 
> ===> .... and that is where all the problem is that has stopped me 
> from being successful!
> 
> I am hoping that an SQL statement would do all the work fine .... but 
> if that will not work, I am also open to
> adding any PHP lines for more versatility if needed.
> 
> I have made a few column name changes since approaching the group 
> back around May 10 but other than that,
> I tried out a script tonight suggested to me back then:
> 
> SELECT DISTINCT tervuren_target.genus_name, 
> tervuren_target.species_name, tervuren_target.authorities_species, 
> tervuren_target.source
> FROM tervuren_target
> LEFT JOIN species_master
> ON tervuren_target.species_name = species_master.species_name
> LIMIT 0,12000
> 
> I had to add in the LIMIT statement or for some reason the server 
> would time out every time. The script ran ok
> but when I looked at the number of records it created (11,377) that 
> was EXACTLY how many records
> there are in the original Tervuren file! In other words, there was no 
> success in avoiding new duplicates at all
> and the data did NOT actually merge into species_master <big sigh!>. 
> I also predicted that the DISTINCT
> statement would only work on the tervuren_target table (which was 
> already cleaned of duplicates within itself)
> and I believe I was right. it seems to be extraneous and not the 
> needed answer.
> 
> .... I had never thought it should be that hard just to add in new 
> data to a table of central important without
> introducing new duplicate records --- but it is.
> 
> Therefore I come in humble request to all of you, hoping someone can 
> figure out what actually will work.
> The solution is harder than may first seem.  My email address if you 
> need it is billmudry at rogers.com
> 
> With thanks in advance,
> 
> Fingers crossed, toes crossed, eyeballs crossed .......
> 
> Bill Mudry
> Mississauga, Ontario Canada.

This would be much better taken to the DB list, where the entire
audience is knowledgable about databases, so answers are more likely ;-)

First, what database engine are you using? All engines have specific
idiosyncrasies that need to be accounted for, and knowing the engine
will allow more specific answers to your problems.

--
Niel Archer

-- 
PHP Windows Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php