On Thu, Mar 19, 2020 at 5:13 PM Adrian Klaver <adrian.klaver@xxxxxxxxxxx> wrote:
On 3/19/20 3:32 PM, pabloa98 wrote:
>
>
> On Thu, Mar 19, 2020 at 3:17 PM Rob Sargent <robjsargent@xxxxxxxxx
> <mailto:robjsargent@xxxxxxxxx>> wrote:
>
>
>
>> On Mar 19, 2020, at 4:13 PM, pabloa98 <pabloa98@xxxxxxxxx
>> <mailto:pabloa98@xxxxxxxxx>> wrote:
>>
>>
>>
>> On Thu, Mar 19, 2020 at 2:50 PM Rob Sargent <robjsargent@xxxxxxxxx
>> <mailto:robjsargent@xxxxxxxxx>> wrote:
>>
>>
>>
>> > On Mar 19, 2020, at 3:36 PM, pabloa98 <pabloa98@xxxxxxxxx
>> <mailto:pabloa98@xxxxxxxxx>> wrote:
>> >
>> > Hello,
>> >
>> > My schema requires a counter for each combination of 2
>> values. Something like:
>> >
>> > CREATE TABLE counter(
>> > group INT NOT NULL,
>> > element INT NOT NULL,
>> > seq_number INT NOT NULL default 0,
>> > CONSTRAINT PRIMARY KEY (group, element)
>> > );
>> >
>> > For each entry in counter, aka for each (group, element)
>> pair, the model requires a seq_number.
>> >
>> > If I use a table "counter", I could still have counter
>> collisions between 2 transactions. I need truly sequence
>> behavior. Is that possible by using a table like "counter"
>> table, where the counter could be increased out of the
>> transaction so it performs as a sequence without having race
>> conditions between concurrent transactions?
>> >
>> > The other option is to create sequences for each new pair of
>> (group, element) using triggers. There are millions of pairs.
>> So this approach will generate millions of sequences.
>> >
>> > How a PostgreSQL database would behave having millions of
>> sequences in a schema? Would it degrade its performance? Is
>> there any negative impact?
>> >
>> > Regards
>> >
>> > Pablo
>> >
>>
>> To clarify, are you hoping for consecutive numbers as the each
>> row is added to the table, i.e. “serial”?
>>
>> What is the intension of “seq_number”?
>> >
>> >
>>
>>
>> the idea is to have like a serial sequence, but for each pair of
>> (group, element).
>>
>> so that when we insert rows in another table, we could have
>> something like:
>>
>>
>> group, element, event_id, ...
>> 1, 1, 1
>> 1, 1, 2
>> 1, 1, 3
>> 2, 1, 1
>> 1, 1, 4
>> 1, 3, 1
>> 1, 1, 5
>> 1, 3, 2
>> 2, 1, 2
>> 2, 1, 3
>>
>> The 3rd column is the sequence number we get from the appropriate
>> sequence created by the trigger.
>> I want to implement a variation of
>> https://stackoverflow.com/a/30204854 and that will generate
>> millions of sequences.
>>
>>
> Then I don’t thing group/element can be a PRIMARY KEY
>
>
> Thank you for your answer. Let me explain:
>
> Table "counter" will have triggers that will create a sequence with a
> new row is inserted called counter_1_1_seq, counter_2_1_seq,
> counter_1_3_seq, etc that will be used to insert values in **another**
> table. It will be used for nothing else. When we insert millions of
> pairs group/element, the trigger in that table will generate millions of
> sequences.
This has train wreck written all over it. If you are going to have a
trigger for each combination of (group, element) I gotta believe the
table will fall over before you run into sequence issues. Not sure why
there can't be serial column that has supplies the sequence numbers and
therefore only one sequence in play. From what I see all you care about
is that:
group, element, event_id
has an increasing event_id for (group, element) pairs.
I need a different sequence because it is a business requirement.
Please forget about the first example. I was not clear. It will be something like this.
I have a table called "pair":
CREATE TABLE pair(
group INT NOT NULL,
element INT NOT NULL,
CONSTRAINT PRIMARY KEY (group, element)
);
I must add entries to the table "event". This table event will have a code that will be generated using a sequence of that is a function of s(group,element).
CREATE TABLE event(
group INT NOT NULL,
element INT NOT NULL,
code INT NOT NULL,
CONSTRAINT PRIMARY KEY(code, element, group)
);
I could make column code SERIAL and be done with it. HOWEVER, the "code" column starting from 1 for each pair (group, element) is a business requirement. The business domain dictates that for each pair of (group, element) there is a code that will start from 1 to 99999999. It is OK to have gaps in the sequence.
So this table "event" will contain entries like:
1, 1, 1
1, 1, 2
1, 1, 3
2, 1, 1
1, 1, 4
1, 3, 1
1, 1, 5
1, 3, 2
2, 1, 2
2, 1, 3
1, 1, 2
1, 1, 3
2, 1, 1
1, 1, 4
1, 3, 1
1, 1, 5
1, 3, 2
2, 1, 2
2, 1, 3
These triplets will be inserted by several processes (aka several concurrent transactions could happen at the same time). Since they are concurrent transactions, it could be possible that a trivial calculation of "code" column (something like SELECT MAX(code) + 1 FROM event WHERE group=1 AND element = 1;) will generate collisions because 2 transactions could come with the same value of "code" at the same time.
So I need something like a sequence for each pair of (group, element)
So I added to the table "pair" a trigger to create/drop these sequences each time a new row is inserted or deleted.
Something like:
CREATE FUNCTION make_pair_sequence() RETURNS trigger
LANGUAGE plpgsql
AS $make_pair_sequence$
BEGIN
IF (TG_OP = 'INSERT' OR TG_OP = 'UPDATE') THEN
EXECUTE format('
CREATE SEQUENCE IF NOT EXISTS code_%s_%s_seq
INCREMENT 1
START 1
CACHE 1
MINVALUE 1
MAXVALUE 99999999 -- 8 digits
', NEW.group, NEW.element);
ELSEIF (TG_OP = 'DELETE') THEN
EXECUTE format('
DROP SEQUENCE IF EXISTS code_%s_%s_seq
', NEW.group, NEW.element);
-- ELSEIF (TG_OP = 'TRUNCATE') THEN
END IF;
RETURN NULL; -- result ignored, so no biggie about what it is returned..
END
$make_pair_sequence$;
DROP TRIGGER IF EXISTS make_pair_sequence
ON pair CASCADE;
CREATE TRIGGER make_pair_sequence
AFTER INSERT OR UPDATE OR DELETE ON pair
FOR EACH ROW EXECUTE FUNCTION make_pair_sequence();
LANGUAGE plpgsql
AS $make_pair_sequence$
BEGIN
IF (TG_OP = 'INSERT' OR TG_OP = 'UPDATE') THEN
EXECUTE format('
CREATE SEQUENCE IF NOT EXISTS code_%s_%s_seq
INCREMENT 1
START 1
CACHE 1
MINVALUE 1
MAXVALUE 99999999 -- 8 digits
', NEW.group, NEW.element);
ELSEIF (TG_OP = 'DELETE') THEN
EXECUTE format('
DROP SEQUENCE IF EXISTS code_%s_%s_seq
', NEW.group, NEW.element);
-- ELSEIF (TG_OP = 'TRUNCATE') THEN
END IF;
RETURN NULL; -- result ignored, so no biggie about what it is returned..
END
$make_pair_sequence$;
DROP TRIGGER IF EXISTS make_pair_sequence
ON pair CASCADE;
CREATE TRIGGER make_pair_sequence
AFTER INSERT OR UPDATE OR DELETE ON pair
FOR EACH ROW EXECUTE FUNCTION make_pair_sequence();
each time I insert a pair with something like:
INSERT INTO PAIR( group, element) VALUES (1, 1); -- trigger creates sequence code_1_1_seq
INSERT INTO PAIR( group, element) VALUES (1, 3); -- trigger creates sequence code_1_3_seq
INSERT INTO PAIR( group, element) VALUES (2, 1); -- trigger creates sequence code_2_1_seq
Later I will insert values in the "event" table with values like:
INSERT INTO event(group, element, code) VALUES (1, 1, nextval('code_1_1_seq'));
INSERT INTO event(group, element, code) VALUES (1, 3, nextval('code_1_3_seq'));
INSERT INTO event(group, element, code) VALUES (2, 1, nextval('code_2_1_seq'));
etc
And since there are millions of possible pairs, the database will get millions of sequences. The "code" column value generation on table "event" will have no race conditions and everything will work fine from the business point of view.
Now the questions are:
- Could PostgreSQL deal with millions of sequences? What will happen with maintenance commands (like VACCUM)? Will they suffer performance issues?
- Is there a better method of having independent code counters transaction-safe with no locks?
Sorry about the first email. I did not explain clearly what the problem was.
>
> My question is how PostgreSQL will behave. Could it deal with millions
> of sequences? What about system operations as vacuum, etc?
>
> Pablo
--
Adrian Klaver
adrian.klaver@xxxxxxxxxxx