Re: Whan is it safe to mark a function PARALLEL SAFE?

Tom Lane <tgl@xxxxxxxxxxxxx> · Sun, 08 Sep 2019 15:27:08 -0400

Jim Finnerty <jfinnert@xxxxxxxxxx> writes:
> According to the documentation:
> "Functions and aggregates must be marked PARALLEL UNSAFE if they write to
> the database, access sequences, change the transaction state even
> temporarily (e.g. a PL/pgSQL function which establishes an EXCEPTION block
> to catch errors), or make persistent changes to settings."

I believe the reason for the EXCEPTION-block restriction is that plpgsql
does that by establishing a subtransaction, and we don't allow
subtransactions in workers.  It seems like that's probably just an
implementation restriction that could be lifted with a little work,
much more easily than the general prohibition on writing-to-the-DB
could be.  (Obviously, the subtransaction would still be restricted
from DB writes.)  The "persistent change to settings" rule is there
not because it would fail, but because it wouldn't be persistent ---
the GUC change would only be visible inside the particular worker.

> If a LANGUAGE C function calls ereport(ERROR, ...), does that qualify as a
> potential change to the transaction state that requires it to be marked
> PARALLEL UNSAFE?

No.  It would certainly be impractical to have a rule that you can't
throw errors in workers.

> If an error is raised in one parallel worker, does this
> cause the other parallel workers to be immediately terminated?

I think not, though I didn't work on that code.  The error will
be reported to the parent backend, which will cause it to fail
the query ... but I think it just waits for the other worker
children to exit first.  That's not something to rely on of course.
Even if we don't make an attempt to cancel the other workers today
we probably will in future.  But the cancel attempt would certainly
be asynchronous, so I'm not sure how "immediate" you are worried
about it being.

> How about a C function f(x) that calls out to an external system and returns
> a text value.  If f(x) is altered on the external system, it might return a
> slightly different answer for some x.  Let's say that for some x it returns
> "one" instead of "1", and we happen to know that users don't care if it
> returns "one" or "1".  If someone were to declare f(x) to be PARALLEL SAFE,
> what's the worst that could happen?

Well, this isn't so much about whether the function is parallel safe
as whether it is marked volatile or not; as you describe it, it would
potentially give time-varying results even in a non-parallel query.
Such a function should be marked volatile to avoid strange behavior,
ie the optimizer making invalid assumptions.

AFAIK "parallel safe" and "non-volatile" are more or less independent
restrictions, though someone might correct me.  A function that writes
to the DB must be considered both volatile and parallel unsafe, but
if it doesn't do that then I think it could have any combination
of these properties.

			regards, tom lane