Dane Springmeyer wrote:
On Nov 21, 2007, at 1:37 AM, Richard Huxton wrote:
Dane Springmeyer wrote:
These mostly reside in MS access in flat tables and and I am
importing them into postgres.
If you are familiar/comfortable with Access and VB, I'd probably do
the work there.
Unfortunately I am not. I am only familiar with very basic SQL in Access
and not with VB. And I am not interested in investing any time learning
microsoft products. I'd rather put time in learning postgres and php or
other languages to manipulate data in postgres.
Fair enough. It's easy enough to get started with PHP. Oh, you don't
need to run it in a webserver, you can do so from the command-line too.
Any of the scripting languages will do nicely for this sort of thing -
Perl, Python, Ruby etc.
1. Import as-is into a table called e.g. raw_data
2. Identify/add the primary-key (presumably ID in this case) in raw_data
If no ID, you can add a column of type SERIAL to raw_data and let it
be populated automatically.
3. CREATE TABLE lookups.region (id SERIAL, description text NOT NULL
UNIQUE, PRIMARY KEY id);
INSERT INTO lookups.region (description) SELECT DISTINCT region from
raw_data;
4. Repeat #3 for other lookups
5. CREATE TABLE processed_data (...);
INSERT INTO processed_data (id, region_id, ...)
SELECT raw.id, lkp_reg.id, ...
FROM raw_data raw
JOIN lookups.region lkp_reg ON raw.region = lkp_reg.description
JOIN lookups.whatever...
Wow. That was EXTREMELY helpful. With those concepts I've not been able
to do EXACTLY what I was shooting for AND now understand the SQL well
enough to start thinking of better ways to do it as well. Thank you.
Good.
Here is the SQL which inserts the sample data and processes it into 4
different tables. Perhaps I after you take a look I could post this back
to the group?
I've cc:ed the list on this one for you - plenty of smart people on that
list.
CREATE TABLE raw_data (
Later on, you might want to consider CREATE TEMPORARY TABLE... but don't
worry for the moment.
wid int4,
region character(35),
drain character(65),
eco character(29)
These should almost certainly be "varchar" (or "character varying") -
unless you actually want each field padded with spaces. In fact, since
this is just a temporary thing I'd make them all type "text".
);
INSERT INTO raw_data VALUES (11210, 'SW Oregon / N Cali Coast', 'Rogue
River', 'California Coast');
INSERT INTO raw_data VALUES (11100, 'SW Oregon / N Cali Coast', 'Coastal
grouping of Chetco River and Pistol River', 'California Coast');
You might want to read the manuals regarding the COPY <table> FROM STDIN
command. Good for bulk-loading data.
Also, you could download the pg-odbc driver and link to PG from Access
to copy the data over.
drop table region;
There's an "IF EXISTS" clause you can add to DROP TABLE - prevents errors.
CREATE TABLE region (id SERIAL, name text NOT NULL UNIQUE, PRIMARY KEY
(id));
INSERT INTO region (name) SELECT DISTINCT region from raw_data;
Select * from region;
[snip repeats for other tables]
drop table processed_data;
CREATE TABLE processed_data (wid int4 NOT NULL UNIQUE, region_id int4
NOT NULL, eco_id int4 NOT NULL, drain_id int4 NOT NULL, PRIMARY KEY (wid));
INSERT INTO processed_data (wid, region_id, eco_id, drain_id)
SELECT w.wid, r.id, e.id, d.id
FROM raw_data as w, region as r, eco as e, drain as d
WHERE w.region = r.name AND w.drain = d.name AND w.eco = e.name;
Good. You'll want to read up on foreign-keys too. You can define them
when you build the table, or add them after. Something like (not checked):
ALTER TABLE processed_data ADD CONSTRAINT valid_region
FOREIGN KEY (region_id) REFERENCES region (id);
6. Wrap the entire lot above in BEGIN...END so it either all works or
it doesn't then VACUUM FULL the database at key points.
I'm not familiar with these ideas, but I'll look into them in the docs...
Read up on VACUUM, ANALYSE and the autovacuum daemon (agent).
If you're comfortable with a bit of VB/Perl/Python/PHP/plpgsql then
you can automate that fairly simply. If not, a bit of cut+paste will
see you there.
Yes, I think I'll experiment with trying to produce this SQL text with
php. Seems like in a very short time I could have a custom script to
parse any table given column names and types which I'd like to break
out. Very nice.
The fun comes when you have to clean up the data - correct spelling
mistakes, remove duplicates etc.
--
Richard Huxton
Archonet Ltd
---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?
http://www.postgresql.org/docs/faq