On 3/20/20 9:59 AM, Adrian Klaver wrote:
On 3/19/20 10:31 PM, pabloa98 wrote:
I see.
Any suggestion? It should behave like a sequence in the sense that
concurrent transitions will get different numbers from this
alternative sequence like solution.
In our case, we will need to do a call nextval('some_seq') (or
similar) from different processes no more than twice every minute.
It would be nice to have a sequence data type. Correct me if I am
wrong, but It seems to me that a sequence data type would cost the
same or less than the current sequence implementation.
The schema would be more clear too. We could have a table like:
CREATE TABLE pair(
group INT NOT NULL,
element INT NOT NULL,
seq SEQUENCE INCREMENT 1
START 1
CACHE 1
MINVALUE 1
MAXVALUE 99999999
NOT NULL,
CONSTRAINT PRIMARY KEY (group, element)
);
And then:
INSERT INTO event(group, element, code)
VALUES (
1,
1,
( SELECT seq.nextval('code_1_1_seq') FROM pair p WHERE
p.group=1 and p.code=1 )
);
Or perhaps storing all the sequences in the same table as rows will
have the same behavior.
If code is just something to show the sequence of insertion for group,
element combinations then maybe something like below:
CREATE TABLE event(
group_id INT NOT NULL, --changed as group is reserved word
element INT NOT NULL,
insert_ts timestamptz NOT NULL DEFAULT clock_timestamp(),
PRIMARY KEY(group_id, element, insert_ts)
);
insert into event(group_id, element) VALUES
(1, 1),
(1, 1),
(1, 1),
(2, 1),
(1, 1),
(1, 3),
(1, 1),
(1, 3),
(2, 1),
(2, 1);
select * from event ;
group_id | element | insert_ts
----------+---------+--------------------------------
1 | 1 | 03/20/2020 09:51:12.675926 PDT
1 | 1 | 03/20/2020 09:51:12.675985 PDT
1 | 1 | 03/20/2020 09:51:12.675991 PDT
2 | 1 | 03/20/2020 09:51:12.675996 PDT
1 | 1 | 03/20/2020 09:51:12.676 PDT
1 | 3 | 03/20/2020 09:51:12.676004 PDT
1 | 1 | 03/20/2020 09:51:12.676008 PDT
1 | 3 | 03/20/2020 09:51:12.676012 PDT
2 | 1 | 03/20/2020 09:51:12.676016 PDT
2 | 1 | 03/20/2020 09:51:12.67602 PDT
(10 rows)
select group_id, element, row_number() OVER (partition by (group_id,
element) order by (group_id, element)) AS code from event;
group_id | element | code
----------+---------+------
1 | 1 | 1
1 | 1 | 2
1 | 1 | 3
1 | 1 | 4
1 | 1 | 5
1 | 3 | 1
1 | 3 | 2
2 | 1 | 1
2 | 1 | 2
2 | 1 | 3
(10 rows)
Oops the above actually returned:
select group_id, element, row_number() OVER (partition by (group_id,
element) order by (group_id, element)) AS code, insert_ts from event;
group_id | element | code | insert_ts
----------+---------+------+--------------------------------
1 | 1 | 1 | 03/20/2020 09:51:12.675926 PDT
1 | 1 | 2 | 03/20/2020 09:51:12.675985 PDT
1 | 1 | 3 | 03/20/2020 09:51:12.675991 PDT
1 | 1 | 4 | 03/20/2020 09:51:12.676008 PDT
1 | 1 | 5 | 03/20/2020 09:51:12.676 PDT
1 | 3 | 1 | 03/20/2020 09:51:12.676004 PDT
1 | 3 | 2 | 03/20/2020 09:51:12.676012 PDT
2 | 1 | 1 | 03/20/2020 09:51:12.67602 PDT
2 | 1 | 2 | 03/20/2020 09:51:12.676016 PDT
2 | 1 | 3 | 03/20/2020 09:51:12.675996 PDT
(10 rows)
Needs to be:
select group_id, element, row_number() OVER (partition by (group_id,
element) order by (group_id, element, insert_ts)) AS code, insert_ts
from event;
group_id | element | code | insert_ts
----------+---------+------+--------------------------------
1 | 1 | 1 | 03/20/2020 09:51:12.675926 PDT
1 | 1 | 2 | 03/20/2020 09:51:12.675985 PDT
1 | 1 | 3 | 03/20/2020 09:51:12.675991 PDT
1 | 1 | 4 | 03/20/2020 09:51:12.676 PDT
1 | 1 | 5 | 03/20/2020 09:51:12.676008 PDT
1 | 3 | 1 | 03/20/2020 09:51:12.676004 PDT
1 | 3 | 2 | 03/20/2020 09:51:12.676012 PDT
2 | 1 | 1 | 03/20/2020 09:51:12.675996 PDT
2 | 1 | 2 | 03/20/2020 09:51:12.676016 PDT
2 | 1 | 3 | 03/20/2020 09:51:12.67602 PDT
(10 rows)
Pablo
On Thu, Mar 19, 2020 at 7:56 PM Tom Lane <tgl@xxxxxxxxxxxxx
<mailto:tgl@xxxxxxxxxxxxx>> wrote:
Michael Lewis <mlewis@xxxxxxxxxxx <mailto:mlewis@xxxxxxxxxxx>>
writes:
> On Thu, Mar 19, 2020, 5:48 PM David G. Johnston
<david.g.johnston@xxxxxxxxx <mailto:david.g.johnston@xxxxxxxxx>>
> wrote:
>> However, one other consideration with sequences: do you care that
>> PostgreSQL will cache/pin (i.e., no release) every single
sequence you
>> touch for the lifetime of the session? (I do not think DISCARD
matters here
>> but I'm just guessing)
> Would you expand on this point or is there someplace specific
in the
> documentation on this?
I think what David is worried about is that a sequence object is a
one-row table in PG's implementation. Thus
(1) each sequence requires a dozen or two rows in assorted system
catalogs (not sure exactly how many offhand).
(2) each sequence uses up 8KB on disk for its table file.
(3) each sequence you actually access within a particular session
results in creation of relcache and catcache entries in that
session's local memory. I'm not sure offhand how big those are
either, but a few KB per sequence would be a reasonable guess.
(4) each sequence competes for space in the shared-buffer arena,
since its 8K block has to be swapped into there whenever you try
to access/increment the sequence's value.
This is all mighty inefficient of course, and there's been talk
of trying to reduce the per-sequence overhead; but I don't know
of anyone actively working on that. As things stand, I think
having millions of sequences would be quite painful performance-
wise, especially if your workload were actively hitting a lot
of them concurrently. It would work, for some value of "work",
but it wouldn't perform very well.
Also, as multiple people mentioned already, this still wouldn't
guarantee gap-free sequences of ID values.
regards, tom lane
--
Adrian Klaver
adrian.klaver@xxxxxxxxxxx