Search Postgresql Archives

Re: Recursive Arrays 101

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 10/26/2015 01:51 PM, David Blomstrom wrote:
I'm focusing primarily on vertebrates at the moment, which have a total
of (I think) about 60,000-70,000 rows for all taxons (species, families,
etc.). My goal is to create a customized database that does a really
good job of handling vertebrates first, manually adding a few key
invertebrates and plants as needed.

I couldn't possibly repeat the process with invertebrates or plants,
which are simply overwhelming. So, if I ever figure out the Catalogue of
Life's database, then I'm simply going to modify its tables so they work
with my system. My vertebrates database will override their vertebrate
rows (except for any extra information they have to offer).

As for "hand-entry," I do almost all my work in spreadsheets. I spent a
day or two copying scientific names from the Catalogue of Life into my
spreadsheet. Common names and slugs (common names in a URL format) is a
project that will probably take years. I might type a scientific name or
common name into Google and see where it leads me. If a certain
scientific name is associated with the common name "yellow birch," then
its slug becomes yellow-birch. If two or more species are called yellow
birch, then I enter yellow-birch in a different table ("Floaters"),
which leads to a disambiguation page.

For organisms with two or more popular common names - well, I haven't
really figured that out yet. I'll probably have to make an extra table
for additional names. Catalogue of Life has common names in its
database, but they all have upper case first letters - like American
Beaver. That works fine for a page title but in regular text I need to
make beaver lowercase without changing American. So I'm just starting
from square one and recreating all the common names from scratch.

I think there has to be a better way as this is just a formatting issue Can't remember what programming language you are working in, but in Python:

In [13]: s = 'American Beaver'

In [14]: s.capitalize()
Out[14]: 'American beaver'

In [15]: s.lower()
Out[15]: 'american beaver'


It gets still more complicated when you get into "specialist names." ;)
But the system I've set up so far seems to be working pretty nicely.

On Mon, Oct 26, 2015 at 1:41 PM, Rob Sargent <robjsargent@xxxxxxxxx
<mailto:robjsargent@xxxxxxxxx>> wrote:

    On 10/26/2015 02:29 PM, David Blomstrom wrote:

        Sorry for the late response. I don't have Internet access at
        home, so I only post from the library or a WiFi cafe.

        Anyway, where do I begin?

        Regarding my "usage patterns," I use spreadsheets (Apple's
        Numbers program) to organize data. I then save it as a CSV file
        and import it into a database table. It would be very hard to
        break with that tradition, because I don't know of any other way
        to organize my data.

        On the other hand, I have a column (Rank) that identifies
        different taxonomic levels (kingdom, class, etc.). So I can
        easily sort a table into specific taxonomic levels and save one
        level at a time for a database table.

        There is one problem, though. I can easily put all the
        vertebrate orders and even families into a table. But genera
        might be harder, and species probably won't work; there are
        simply too many. My spreadsheet program is almost overwhelmed by
        fish species alone. The only solution would be if I could import
        Mammals.csv, then import Birds.csv, Reptiles.csv, etc. But that
        might be kind of tedious, especially if I have to make multiple
        updates.

    Yes I suspect you spreadsheet will be limited in rows, but of course
    you can send all the spreadsheets to a single table in the database.
    If that's what you want.  You don't have to, but you see mention of
    tables millions of records routinely.  On the other hand, if
    performance becomes an issue with the single table approach you
    might want to look at "partitioning".  But I would be surprised if
    you had to go there.

    What is your data source?  How much hand-entry are you doing? There
    are tools which (seriously) upgrade the basic 'COPY into <table>'
    command.


        As for "attributes," I'll post my table's schema, with a
        description, next.





--
David Blomstrom
Writer & Web Designer (Mac, M$ & Linux)
www.geobop.org <http://www.geobop.org>


--
Adrian Klaver
adrian.klaver@xxxxxxxxxxx


--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux