Using the power of the PC to find new species in lists

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I have spent years now trying to find as many species of plants that 
are wood and woody. A common practice has been to manually compare 
the woods listed in one file (usually text (.ascii, doc, comma delimited, etc.) , 
spreadsheet or database file) that lists the species I already have found 
(ie. a master list to compare against) against a target list of species. 
The major task of each comparison has been to separate out new species
not yet in the master list and --- write them into a new file. 

Since the master list now has over 6,000 species listed, you can just 
imagine that manual comparisons took me hours and hours if not even 
days! I know, too, that rushing through to compare two lists is one of 
the most classical applications for which you could use a computer. 
What takes me hours to do it should be able to whip through in 
seconds! I have a large collection of target files, so having computer 
power to do this would be equivalent to lighting a rocket under the project!

I am surprised that it has been that hard for me to find something that 
would work when this should be quite basic coding.

There are all kinds of file comparison programs out there. The largest 
problem I have come across using them is that they show the results
side by side as only color differences --- where I need all new records
to the master file to be written into a new file instead. I have been a bit
surprised how hard it has been to find such programs since this is 
one of the tenants of basic computer training, such as how ISAM files
used to be merged and compared very frequently in mainframe days
to update company records.

Any suggestions on code in PHP or even a finished application 
that would do this? If I could get the code, that would be even better 
so later I can tweak it to fit what I do even more. A simple comparison 
would be just fine for now. Later I hope to improve on it in number of 
ways such as:
        - Eliminating duplicates before comparison
        - Being able to do comparisons on a variety of file formats
        - Facilities to massage target files into a common set of
          fields before trying to compare them. The target files can
          come in all kinds of file formats and layouts that need
          massaging before comparisons would mean anything.
        - Cleaning out control characters and other garbage from a
           target file before comparisons.

--- but for now and for some time to come, all this extra is a lot of
coding work that can wait ;-) . I would be so elated just to be able
to run though such file comparisons in record breaking speeds 
compared to my manual methods.

P.S. -- I am just starting to gain some proficiency in using PHP.
Studying such code would also be a good learning exercise.

Always in thanks for any help given,

Bill Mudry
Mississauga, Ontario

[Index of Archives]     [PHP Home]     [PHP Users]     [PHP Database Programming]     [PHP Install]     [Kernel Newbies]     [Yosemite Forum]     [PHP Books]

  Powered by Linux