Re: Packages, inner subprograms, and parameterizable anonymous blocks for PL/pgSQL

Bryn Llewellyn <bryn@xxxxxxxxxxxx> · Wed, 22 Dec 2021 17:27:25 -0800

david.g.johnston@xxxxxxxxx wrote:

bryn@xxxxxxxxxxxx wrote:

Is this model not used for the PostgreSQL system?

It is not.  Basically we have an "Proposed Patches" tracker and they either get committed, or they do not.  Once committed, new features become available at the next annual release.

1. I see that a schema variable will be another kind of object in the class that has tables, views, sequences, indexes, UDTs, procedures, functions, and so on. So it’s persisted within a database; it’s owned by a user; and it’s localized in a schema. Does it have its own namespace too? For example, can a table x, a function x(), and a schema variable x all live in the same schema s in some database. And can I use the normal qualified name syntax to access a schema variable?

I haven't read the patch in great detail (or recently...) but: The first four items you listed (relations) share a namespace.  Procedures and functions share a different namespace.  I'm presuming schema variables will share the relations namespace.  These namespaces are subdivisions of schemas.

2. It seems, then, that all the familiar notions, and especially the ability for a non-owner to access it apply. I see that the SELECT privilege governs who can read the value of a schema variable. But there seems to be no mechanism that governs who can change the value of a schema variable. It looks like only the owner can change the value—using the new LET SQL statement. Does this mean both that a top-level call from a client session that’s authorized as the owner can change it and a “security definer” subprogram with the same owner can change it—but that there is no explicit (say, WRITE) privilege for this. Is this what you intend? If so, why not have a WRITE privilege?

We’d probably call it INSERT and/or UPDATE privileges to avoid creating a new enumeration value.

4. You said “security definer function”. Is this what you mean? Or do you mean ANY function or procedure as long as the current user (i.e. the owner of the most tightly enclosing security definer unit) is the schema variable’s owner?

https://www.postgresql.org/docs/current/sql-createfunction.html

If the invoker (current user) and the function owner are the same user the specified mode is immaterial for purposes of that query.  Though in the presence of views and triggers it can still get a bit confusing.

5. Could you please consider allowing a CONSTANT schema variable (with the usual syntax and requirement for an initialization _expression_)? One very popular use of a spec-level package variable is for a universal constant that has to be accessed in several places—like, for example, the conversion factor between some metric unit and some imperial unit.

Interesting.  Specifically, though, constant even for a superuser and the owner? Usually constant is enforced by just not allowing people to update.

Another way to think of the question, are these immutable or stable?

3. What is the precedence scheme? For example, if a SQL statement in a PL/pgSQL unit has a restriction like this:

   …where col = x…

and x is both the name of an in-scope variable (or formal parameter) in the unit and the name of a schema variable?

When the table has a column called x, then there’s (at least sometimes) no way round a run-time collision error except to rename one of the X’s. (Qualifying the names doesn’t fix it.)

Usually qualifying fixes it just fine - just some syntax elements presently do not allow for a qualifier to be added preventing the option from being used.

Will it be the same with schema variables? The question extends to ordinary assignment statements that become SQL statements under the covers:

  v := a + b;

where b happens to be both an in-scope variable and a schema variable.

I'm light on specifics at the moment but this is already a solved problem and schema variables should stick with the existing convention.  Which I think is telling the user there is ambiguity and to add an appropriate qualifier.

[If I understand correctly], a schema variable is slightly augmented shorthand for what today can be done by writing a scalar subquery (i.e., schema.col == (SELECT tbl.vars FROM schema.vars)) with the presumption that the table vars has exactly one row.

Thanks for clarifying how code that brings a new feature gets into a PG Release.

1. Yes, I worded my question about namespaces loosely aiming only to elicit a clear account of the rules. Having schema variables share the relations namespace sounds sensible.

2. Your point about preferring to reuse existing keywords over inventing new ones is well-taken. UPDATE seems to me to be the right choice. If we can informally consider NULL to be a special value rather than the total absence of information, then a schema variable always has exactly one value. So INSERT has nowhere to do it. I’ll take the fact that you considered how to name the keyword to mean that you see no a priori reason to have a rule that only the owner of a schema variable can change its value. This implies that the spirit of the usual privilege model should hold for schema variables too.

4. Here, too, my question aimed only to elicit an unambiguous statement of the rules. I suggest this:

The security mode of a procedure or function, and the joint effect of these modes when the stack has units some of which might be “invoker” and some of which might be “definer”, govern the ability of a subprogram to change the value of a schema variable in exactly the same way as they govern the ability of a subprogram to execute change-making DML statements on a table.

5. About CONSTANT, I’d argue that the semantic model is already well defined in the context of a PL/pgSQL constant. The schema variable brings no new challenges; and so there’s no need for a new model. The initial value is defined in the source code. And you can’t do anything with ordinary run-time mechanisms to change that initial value once it’s set. The understanding extends to the case that the initial value isn’t a platonic constant but can be set using, say, clock_timestamp(). If you don’t like how the initial value is set, then you can change this with a DDL. So it would be with a CONSTANT schema variable, too. (After all, a superuser can’t magically tunnel through and change the value of a PL/pgSQL CONSTANT except by re-defining the source code. I don’t know if there’s a need for an ALTER statement for a schema variable. Presumably, CREATE OR REPLACE would suffice.

The case of initializing a CONSTANT to, say, clock_timestamp() raises a question: “At what moment is this done?”. Is it at session creation time? Or on first use? I suppose that this dilemma could be side-stepped by requiring that the initialization must be to a manifest constant _expression_. This would cover many cases of how package constant are used (Avagadro’s number and the like).

3. Here’s an example of the scenario class that I had in mind. First a problem-free case:

create table u1.t(k int primary key, v text not null);
...
drop function if exists u1.f(int) cascade;
create function u1.f(k in t.k%type)
  returns t.v%type
  language plpgsql
as $body$
begin
  return (select t.k from t where t.k = f.k);
end;
$body$;

Here, f() runs without error and, with data in the table, gives the expected result. Now the case that gives a run-time error:

create procedure u1.p(k in t.k%type, v in t.v%type)
  security invoker
  language plpgsql
as $body$
begin
  update t set t.v = p.v where t.k = p.k;
end;
$body$;

At run-time, p() terminates with an obscurely worded error:

42703: column "t" of relation "t" does not exist.

Some people like this style (in a database where it works). You don’t have to think up new names for things that already have good names. Rather, you use just one set of spellings and dot-qualify them. I haven’t been able to find any way around this except to use different names for the formals, like k_in and v_in. This is feeble because anybody can later add a column to the table with one of those names. And then the run-time error comes back. It’s far better to rely on syntax than convention. But I don’t suppose that this will ever change.

I suppose, therefore, that similar practices (subject to capture in the same way) will be needed when you use schema variables.