Re: Moving to postgresql and some ignorant questions

Gregory Stark <stark@xxxxxxxxxxxxxxxx> · Tue, 14 Aug 2007 15:47:30 +0100

"Phoenix Kiula" <phoenix.kiula@xxxxxxxxx> writes:

> Though other threads I have learned that multiple inserts or updates
> can be sped up with:
>
> [QUOTE]
> - BEGIN TRANSACTION;
> - INSERT OR UPDATE queries, ideally PREPAREd;
> - COMMIT;
> [/QUOTE]
>
> QUESTION1:
> Is this it? Or am I missing something in terms of execution?  We use
> Perl on our site but may be gradually switching to PHP at some point.
> Will the above plan of execution be ok?

A transaction is a bunch of queries which you want to all get committed or
aborted together. The expensive step in Postgres is the COMMIT. Postgres does
an fsync which forces the data to go to disk (if you're using good hardware,
and don't have fsync=off). That takes from 5-15ms depending on how much data
and how fast your drives are.

Grouping more work into a single transaction makes the delay for the fsync at
COMMIT time less of a problem. Also having more connections (but not too many,
more than a few per processor is probably not helping, more than 100 and it's
probably slowing you down) also means it's less important since another
process can do some of its work while you're waiting for the fsync.

> My queries are all optimized and indexed well. But the defragmentation
> resulting from UPDATEs can be a pain. I wonder if PGSQL's autovacuum
> will help. As for autovacuum we have every intention of leaving it on.
> Will the following settings be alright?

Yes, it's a pain. Running vacuum frequently will be necessary. You may also
have to raise your fsm settings to allow Postgres to remember more free space
between vacuums.

> [QUOTE]
> autovacuum = on
> vacuum_cost_delay = 30
> stats_start_collector = on
> stats_row_level = on
> autovacuum_vacuum_threshold = 100
> autovacuum_analyze_threshold = 50
> [/QUOTE]
>
> I am hoping that the frequent vacuum thresholds will help, but:
>
> QUESTION 2:
> Are these settings too aggressive? While an autovacuum is running in
> the background, will it lock tables or anything? Can the tables still
> be operational, and the autovacuum will automatically resume from
> whatever point it was at? I am worried about how autovacuum will
> perform when

VACUUM doesn't lock tables. It's designed to operate without interfering.

It does still take up i/o bandwidth which affects performance. The
autovacuum_cost_delay above tells it to wait 30ms every few pages to try to
avoid slowing down production. You'll have to judge based on experience
whether it's taking too long with that time. You may be better off starting
with 10ms or 20ms instead.

I don't think the threshold parameters will be relevant to you. You should
look at autovacuum_vacuum_scale_factor and autovacuum_analyze_scale_factor. I
may be wrong though, someone more familiar with autovacuum in 8.2 might have
to speak up.

> QUESTION 3.
> Data integrity checks in MySQL world were very patchy, relying on CGI
> stuff to make sure, which does not always work. We are trying our best
> to get rid of them. With postgresql, I realize we can have triggers as
> well as foreign keys. But searching through old threads on this group
> suggests that triggers also present performance problems. On tables
> that are infrequently updated, can I write triggers without worrying
> about performance? Or, how can I make sure their performance is as
> best as it can be, i.e., which of the config vars is responsible for
> that?

Triggers are quite efficient in Postgres but they still cost something. Nobody
will be able to give you a blanket statement that you can do anything without
testing it. But you're probably better off having them and then considering
removing them later if you have a problem.

Incidentally, most data integrity checks are handled with CHECK constraints
and FOREIGN KEY constraints rather than manual triggers. They're both easier
and cheaper.

> QUESTION 4:
> Character sets: In MySQL we had utf-8 as our default character set.
> Yet, on some tables we had the collation of latin1_general_ci, and
> only on specific columns we had "utf8_general_ci" (where non-english
> text needed to be stored). How should I do this in pgsql? When I do a
> mysqldump of these tables, and then COPY them back into pgsql, I
> always see the error "ERROR:  invalid byte sequence for encoding
> "UTF8": 0xa7". So I do a
>
> \encoding latin1
>
> And then my COPY import works. But this is not what I want. How can I
> set up one of my columns in this table to be utf-8, and the rest to be
> latin? Then I would like to import with "\encoding utf8". Can this be
> somehow done?

Sorry, this is one of the main deficiencies in Postgres. You will probably
have to convert your data to utf8 across the board and hopefully you'll find a
collation which satisfies all your needs. You can't switch encoding or
collation on the fly.

You could look at the CONVERT function which might help, but I'm not sure
exactly what you would have to do to solve your immediate problem.

If you really need multiple collations in a single database there's a function
pg_strxfrm() which was posted to this list a long time ago. It probably needs
to be updated for 8.2 but what it does is take a string and a collation and
provide a bytea which sorts properly in that collation. The problem is that
it's not very efficient and on some OSes it's extremely inefficient.

-- 
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

               http://www.postgresql.org/docs/faq