Re: Could postgres12 support millions of sequences? (like 10 million)

pabloa98 <pabloa98@xxxxxxxxx> · Thu, 19 Mar 2020 22:31:24 -0700

I see. 

Any suggestion? It should behave like a sequence in the sense that concurrent transitions will get different numbers from this alternative sequence like solution.

In our case, we will need to do a call nextval('some_seq') (or similar) from different processes no more than twice every minute. 

It would be nice to have a sequence data type. Correct me if I am wrong, but It seems to me that a sequence data type would cost the same or less than the current sequence implementation.

The schema would be more clear too. We could have a table like:

CREATE TABLE pair(
group INT NOT NULL,
element INT NOT NULL,
seq SEQUENCE INCREMENT 1 
              START 1
              CACHE 1
              MINVALUE 1
              MAXVALUE 99999999 
        NOT NULL,
CONSTRAINT PRIMARY KEY (group, element)
);

And then:

INSERT INTO event(group, element, code) 
VALUES (
         1, 
         1, 
         ( SELECT seq.nextval('code_1_1_seq') FROM pair p WHERE p.group=1 and p.code=1 )
);

Or perhaps storing all the sequences in the same table as rows will have the same behavior.

Pablo

On Thu, Mar 19, 2020 at 7:56 PM Tom Lane <tgl@xxxxxxxxxxxxx> wrote:
Michael Lewis <mlewis@xxxxxxxxxxx> writes:

> On Thu, Mar 19, 2020, 5:48 PM David G. Johnston <david.g.johnston@xxxxxxxxx>

> wrote:

>> However, one other consideration with sequences: do you care that

>> PostgreSQL will cache/pin (i.e., no release) every single sequence you

>> touch for the lifetime of the session? (I do not think DISCARD matters here

>> but I'm just guessing)

> Would you expand on this point or is there someplace specific in the

> documentation on this?

I think what David is worried about is that a sequence object is a

one-row table in PG's implementation.  Thus

(1) each sequence requires a dozen or two rows in assorted system

catalogs (not sure exactly how many offhand).

(2) each sequence uses up 8KB on disk for its table file.

(3) each sequence you actually access within a particular session

results in creation of relcache and catcache entries in that

session's local memory.  I'm not sure offhand how big those are

either, but a few KB per sequence would be a reasonable guess.

(4) each sequence competes for space in the shared-buffer arena,

since its 8K block has to be swapped into there whenever you try

to access/increment the sequence's value.

This is all mighty inefficient of course, and there's been talk

of trying to reduce the per-sequence overhead; but I don't know

of anyone actively working on that.  As things stand, I think

having millions of sequences would be quite painful performance-

wise, especially if your workload were actively hitting a lot

of them concurrently.  It would work, for some value of "work",

but it wouldn't perform very well.

Also, as multiple people mentioned already, this still wouldn't

guarantee gap-free sequences of ID values.

                        regards, tom lane