Search Postgresql Archives

Re: Normalization tools for postgres?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Dane Springmeyer wrote:
On Nov 21, 2007, at 1:37 AM, Richard Huxton wrote:
Dane Springmeyer wrote:

These mostly reside in MS access in flat tables and and I am importing them into postgres.

If you are familiar/comfortable with Access and VB, I'd probably do the work there.

Unfortunately I am not. I am only familiar with very basic SQL in Access and not with VB. And I am not interested in investing any time learning microsoft products. I'd rather put time in learning postgres and php or other languages to manipulate data in postgres.

Fair enough. It's easy enough to get started with PHP. Oh, you don't need to run it in a webserver, you can do so from the command-line too. Any of the scripting languages will do nicely for this sort of thing - Perl, Python, Ruby etc.


1. Import as-is into a table called e.g. raw_data

2. Identify/add the primary-key (presumably ID in this case) in raw_data
If no ID, you can add a column of type SERIAL to raw_data and let it be populated automatically.

3. CREATE TABLE lookups.region (id SERIAL, description text NOT NULL UNIQUE, PRIMARY KEY id); INSERT INTO lookups.region (description) SELECT DISTINCT region from raw_data;

4. Repeat #3 for other lookups

5. CREATE TABLE processed_data (...);
INSERT INTO processed_data (id, region_id, ...)
SELECT raw.id, lkp_reg.id, ...
FROM raw_data raw
JOIN lookups.region lkp_reg ON raw.region = lkp_reg.description
JOIN lookups.whatever...

Wow. That was EXTREMELY helpful. With those concepts I've not been able to do EXACTLY what I was shooting for AND now understand the SQL well enough to start thinking of better ways to do it as well. Thank you.

Good.

Here is the SQL which inserts the sample data and processes it into 4 different tables. Perhaps I after you take a look I could post this back to the group?

I've cc:ed the list on this one for you - plenty of smart people on that list.

CREATE TABLE raw_data (

Later on, you might want to consider CREATE TEMPORARY TABLE... but don't worry for the moment.

    wid int4,
    region character(35),
    drain character(65),
    eco character(29)

These should almost certainly be "varchar" (or "character varying") - unless you actually want each field padded with spaces. In fact, since this is just a temporary thing I'd make them all type "text".

);

INSERT INTO raw_data VALUES (11210, 'SW Oregon / N Cali Coast', 'Rogue River', 'California Coast'); INSERT INTO raw_data VALUES (11100, 'SW Oregon / N Cali Coast', 'Coastal grouping of Chetco River and Pistol River', 'California Coast');

You might want to read the manuals regarding the COPY <table> FROM STDIN command. Good for bulk-loading data.

Also, you could download the pg-odbc driver and link to PG from Access to copy the data over.


drop table region;

There's an "IF EXISTS" clause you can add to DROP TABLE - prevents errors.

CREATE TABLE region (id SERIAL, name text NOT NULL UNIQUE, PRIMARY KEY (id));
INSERT INTO region (name) SELECT DISTINCT region from raw_data;
Select * from region;
[snip repeats for other tables]
drop table processed_data;
CREATE TABLE processed_data (wid int4 NOT NULL UNIQUE, region_id int4 NOT NULL, eco_id int4 NOT NULL, drain_id int4 NOT NULL, PRIMARY KEY (wid));
INSERT INTO processed_data (wid, region_id, eco_id, drain_id)
SELECT w.wid, r.id, e.id, d.id
FROM raw_data as w, region as r, eco as e, drain as d
WHERE w.region = r.name AND w.drain = d.name AND w.eco = e.name;

Good. You'll want to read up on foreign-keys too. You can define them when you build the table, or add them after. Something like (not checked):
ALTER TABLE processed_data ADD CONSTRAINT valid_region
FOREIGN KEY (region_id) REFERENCES region (id);

6. Wrap the entire lot above in BEGIN...END so it either all works or it doesn't then VACUUM FULL the database at key points.

I'm not familiar with these ideas, but I'll look into them in the docs...

Read up on VACUUM, ANALYSE and the autovacuum daemon (agent).

If you're comfortable with a bit of VB/Perl/Python/PHP/plpgsql then you can automate that fairly simply. If not, a bit of cut+paste will see you there.

Yes, I think I'll experiment with trying to produce this SQL text with php. Seems like in a very short time I could have a custom script to parse any table given column names and types which I'd like to break out. Very nice.

The fun comes when you have to clean up the data - correct spelling mistakes, remove duplicates etc.

--
  Richard Huxton
  Archonet Ltd

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

              http://www.postgresql.org/docs/faq

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux