Hello Friends,
We are newly creating tables for a system which is going to consume transaction data from customers and store in postgres version 15+ database. And it would be ~400million transactions/rows per day in the main transaction table and almost double in the multiple child tables and some child tables will hold lesser records too.
Say TABLE1 is the main or parent transaction table which will hold ~400million transactions.
TABLE2 is the child table which is going to hold ~800million rows/day. It has one to many relationships with table1.
TABLe3 is another child table which is going to hold ~200million rows per day.
We are considering all of these tables for partitioning by the same transaction_date column and it would be daily partitions. We have some questions,
1)While creating these tables and related indexes, do we need to be careful of defining any other storage parameters like tablespaces etc Or its fine to make those table/indexes aligned to the default tablespace only? and are there any constraints on tablespace size , as we will have 100's GB of data going to be stored in each of the daily partitions?
2)Should we be creating composite indexes on each foreign key for table2 and table3, because any update or delete on parent is going to take lock on all child tables?
3)We were thinking of simple Btree indexes to be created on the columns based on the search criteria of the queries. but the indexes doc i see in postgres having INCLUDE keywords also available in them. So I'm struggling to understand a bit, how it's adding value to the read query performance if those additional columns are not added explicitly to the index but are part of the INCLUDE clause? Will it give some benefit in regards to the index storage space? or should we always keep all the columns in the index definition itself other than some exception scenario? Struggling to understand the real benefit of the INCLUDE clause.
Below is a sample DDL of what it will look like.
Create table syntax:-
CREATE TABLE TABLE1
(
TXN_ID varchar(36) NOT NULL ,
txn_timestamp date NOT NULL ,
CONSTRAINT TABLE1_PK PRIMARY KEY (TXN_ID,txn_timestamp)
) partition by range (txn_timestamp);
CREATE TABLE TABLE2
(
table2_id varchar(36) not null,
TXN_ID varchar(36) NOT NULL ,
txn_timestamp date NOT NULL ,
CONSTRAINT TABLE2_PK PRIMARY KEY (table2_id,txn_timestamp)
) partition by range (txn_timestamp);
alter table table2 add constraint fk1 (TXN_ID,txn_timestamp) references table1(TXN_ID,txn_timestamp);
Create Index idx1 on TABLE2(TXN_ID,txn_timestamp); -- Index for foreign Key
CREATE TABLE TABLE3
(
table3_id varchar(36) not null,
TXN_ID varchar(36) NOT NULL ,
txn_timestamp date NOT NULL ,
CONSTRAINT TABLE3_PK PRIMARY KEY (table3_id,txn_timestamp)
) partition by range (txn_timestamp);
alter table table2 add constraint fk2 (TXN_ID,txn_timestamp) references table1(TXN_ID,txn_timestamp);
Create Index idx2 on TABLE3(TXN_ID,txn_timestamp); -- Index for foreign key.
We are newly creating tables for a system which is going to consume transaction data from customers and store in postgres version 15+ database. And it would be ~400million transactions/rows per day in the main transaction table and almost double in the multiple child tables and some child tables will hold lesser records too.
Say TABLE1 is the main or parent transaction table which will hold ~400million transactions.
TABLE2 is the child table which is going to hold ~800million rows/day. It has one to many relationships with table1.
TABLe3 is another child table which is going to hold ~200million rows per day.
We are considering all of these tables for partitioning by the same transaction_date column and it would be daily partitions. We have some questions,
1)While creating these tables and related indexes, do we need to be careful of defining any other storage parameters like tablespaces etc Or its fine to make those table/indexes aligned to the default tablespace only? and are there any constraints on tablespace size , as we will have 100's GB of data going to be stored in each of the daily partitions?
2)Should we be creating composite indexes on each foreign key for table2 and table3, because any update or delete on parent is going to take lock on all child tables?
3)We were thinking of simple Btree indexes to be created on the columns based on the search criteria of the queries. but the indexes doc i see in postgres having INCLUDE keywords also available in them. So I'm struggling to understand a bit, how it's adding value to the read query performance if those additional columns are not added explicitly to the index but are part of the INCLUDE clause? Will it give some benefit in regards to the index storage space? or should we always keep all the columns in the index definition itself other than some exception scenario? Struggling to understand the real benefit of the INCLUDE clause.
Below is a sample DDL of what it will look like.
Create table syntax:-
CREATE TABLE TABLE1
(
TXN_ID varchar(36) NOT NULL ,
txn_timestamp date NOT NULL ,
CONSTRAINT TABLE1_PK PRIMARY KEY (TXN_ID,txn_timestamp)
) partition by range (txn_timestamp);
CREATE TABLE TABLE2
(
table2_id varchar(36) not null,
TXN_ID varchar(36) NOT NULL ,
txn_timestamp date NOT NULL ,
CONSTRAINT TABLE2_PK PRIMARY KEY (table2_id,txn_timestamp)
) partition by range (txn_timestamp);
alter table table2 add constraint fk1 (TXN_ID,txn_timestamp) references table1(TXN_ID,txn_timestamp);
Create Index idx1 on TABLE2(TXN_ID,txn_timestamp); -- Index for foreign Key
CREATE TABLE TABLE3
(
table3_id varchar(36) not null,
TXN_ID varchar(36) NOT NULL ,
txn_timestamp date NOT NULL ,
CONSTRAINT TABLE3_PK PRIMARY KEY (table3_id,txn_timestamp)
) partition by range (txn_timestamp);
alter table table2 add constraint fk2 (TXN_ID,txn_timestamp) references table1(TXN_ID,txn_timestamp);
Create Index idx2 on TABLE3(TXN_ID,txn_timestamp); -- Index for foreign key.
Thanks And Regards
Yudhi