Still having a hard time transferring records table to table

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Back about May 10, I presented a difficulty I was having trying to add in more (distinct) records from one MySQL table into another that permanently holds the most important data that I have. Despite help from a couple of you in the forum and numerous efforts by me since, success on my efforts have been denied
<big sigh!>.

First, some review and introduction:
This is for a knowledge base on all woods around the world. The more woods can be reported, the more successful is the project. I have named the project 'TAXA" (..short for taxonomy). Many of the other features I have wanted to add have been coming on quite well but this task is still evading any success. If you care to directly view what this is about, take a look at http://www.prowebcanada.com/taxa/. In fact, you can escape the frames menu and go right to where species only are displayed:
http://www.prowebcanada.com/taxa/alphaspecies.php.
There are 4 major entry points to all the data but you have to be concerned with the 'species' level.

The data there is stored in a table named 'species' (ie. - species.sql). All tables in the project are in database 'taxa'. The more (botanical) names of woods that I can find and add in, the more successful is the whole project. So far I am reporting just over 6,200 different woods (as I say, all in species.sql).

The Thorny Problem
Presently I use all kinds of sources to find new woods. The largest list of woods I have is in a sql file called 'tervuren', with 11,337 wood names. That is WAAYY larger than what is in species.sql so there is a huge interest in copying over these records to species.sql It would be a huge boost that even with duplicates avoided, that would almost double the number of woods I could report!

That is easy said but I have almost being tearing my hair out (... and I am almost bald anyway ;-) )
trying to accomplish this. To prepare for this:

- I protected the integrity of my original files (species.sql and tervuren.sql) by copying them as working files species_master.sql and tervuren_target.sql. It is with these that I will (somehow) copy all wood species names that are NOT in the species_master table that are, however, in the tervuren_target file. (Later after species_master is expanded, I can
          rename it back again to 'species').

- The key column for comparison in both files is called 'species_name' in both files.

- I have already treated the tervuren_target table so that there are no duplicate records
          within itself with respect to column species_name.

- There are a couple columns I do not have to carry over to the species_master table from the tervuren_target table. For the columns that do have to, I made sure they are named exactly the same in both files and that the collation is the same in both.

- The field names common to both the source for my data (ie. tervuren_target) and for
           the destination for the new data (species_master) are:

                genus_name, species_name, authorities_species, source

- The species_master table has about 18 total fields but except for the ones above, none of the other fields I have not listed will receive any new data whatsoever. They are (so to speak) just along for the ... ride. All new data will go into the 4 fields listed above exclusively.

- In any attempt to avoid new duplicates forming as data comes over into species_master, at NO TIME WHATSOEVER can we allow any records in species_master to be deleted! The reason is because meaningful data in the other fields in species_master would be
           lost and data integrity would be violated.

Just adding ALL records from tervuren_ target was easy but it sure did not give me the result i need. Both the tervuren_target table and the species_master table have wood names in them that are common to each. As copying records over is happening, great care has to be taken to avoid NEW duplicates forming via the combining of records from both tables. In fact, just merging all files from tervuren_target over into species_master created over 3,000 NEW DUPLICATES! The result was an unusable mess!

===> .... and that is where all the problem is that has stopped me from being successful!

I am hoping that an SQL statement would do all the work fine .... but if that will not work, I am also open to
adding any PHP lines for more versatility if needed.

I have made a few column name changes since approaching the group back around May 10 but other than that,
I tried out a script tonight suggested to me back then:

SELECT DISTINCT tervuren_target.genus_name, tervuren_target.species_name, tervuren_target.authorities_species, tervuren_target.source
FROM tervuren_target
LEFT JOIN species_master
ON tervuren_target.species_name = species_master.species_name
LIMIT 0,12000

I had to add in the LIMIT statement or for some reason the server would time out every time. The script ran ok but when I looked at the number of records it created (11,377) that was EXACTLY how many records there are in the original Tervuren file! In other words, there was no success in avoiding new duplicates at all and the data did NOT actually merge into species_master <big sigh!>. I also predicted that the DISTINCT statement would only work on the tervuren_target table (which was already cleaned of duplicates within itself) and I believe I was right. it seems to be extraneous and not the needed answer.

.... I had never thought it should be that hard just to add in new data to a table of central important without
introducing new duplicate records --- but it is.

Therefore I come in humble request to all of you, hoping someone can figure out what actually will work. The solution is harder than may first seem. My email address if you need it is billmudry at rogers.com

With thanks in advance,

Fingers crossed, toes crossed, eyeballs crossed .......

Bill Mudry
Mississauga, Ontario Canada.



[Index of Archives]     [PHP Home]     [PHP Users]     [PHP Database Programming]     [PHP Install]     [Kernel Newbies]     [Yosemite Forum]     [PHP Books]

  Powered by Linux