Melvin: haha, yeah, it's a download from the Clark County, NV voterfile website. It's just the format they send out to people who request the file. I worked this summer doing QA on voterfile builds so I'm familiar with the data. I thought it would be good stuff to start with.
But thank you for the code in how to fix that flat file. I'm half-teaching myself SQL and building a database for some game data i have with the plan to sit Tableau on top of it.
btw, you guys are really nice and helpful. I'm definitely coming back, you're making me want to make some progress on that dashboard.
On Wed, Dec 7, 2016 at 8:58 PM, Melvin Davidson <melvin6925@xxxxxxxxx> wrote:
On Wed, Dec 7, 2016 at 8:25 PM, Adrian Klaver <adrian.klaver@xxxxxxxxxxx> wrote:On 12/07/2016 05:19 PM, metaresolve wrote:
Uh, yeah, it was a SELECT * from cc_20161207;
I know, it was dumb. I didn't realize it would break it or at least run for
a while. I tend to do things in small steps, run a query, check my results,
then tweak.
You're right, I wouldn't want to be viewing those million. so I guess I
could just be doing the top 10000 rows to get a sense, then be doing my
aggregate group bys/etc to get my final numbers? That's how I hacked Access
to be my data analysis program until I got to Alteryx. But I was also never
using files of 1M then.
FYI pgAdmin, if I remember correctly, has a setting that limits the maximum number of rows that it will fetch at a time.
--
Adrian Klaver
adrian.klaver@xxxxxxxxxxx
--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
With regard to John's comments
>Postgres, on a properly scaled and tuned database server, can handle billions of records.
>Obviously, doing something silly like querying all billions at once will never be fast, thats a lot of data to marshal and process.
On a similar note, your table structure is not exactly optimized for a relational database.
I see no primary key to identify each row. In addition, the election & vote columns
belong in a separate table that is a child of cc_20161207, since they are repeating fields.
eg:
ALTER TABLE cc_20161207
ADD COLUMN cc_20161207_pk serial;
UPDATE cc_20161207
SET cc_20161207_pk = nextval(cc_20161207_pk_seq)
WHERE cc_20161207_pk IS NULL;
ALTER TABLE cc_20161207
ALTER COLUMN cc_20161207_pk SET NOT NULL;
ALTER TABLE cc_20161207
ADD CONSTRAINT cc_20161207 PRIMARY KEY (cc_20161207_pk);
CREATE TABLE election_data (
cc_20161207_pk bigint NOT NULL,
election_id serial NOT NULL,
election varchar(4),
vote_type varchar(1)
CONSTRAINT election_data_pk PRIMARY KEY (cc_20161207_pk, election_id)
);
--Melvin Davidson
I reserve the right to fantasize. Whether or not you
wish to share my fantasy is entirely up to you.