I would like to make a suggestion, if I may. Granted, I do not understand the underlying task at hand, but:
A table with multiple columns of the same type smacks of designs that harken back to the days of mainframes. (STOP THAT!) The data described is a non-normalized array of integers that is meaningless outside of code. Table structures should be at least a LITTLE self-descriptive.
It is also not flexible (what if you suddenly need t51? how long would that table space adjustment take in production?) and space is wasted if not all 50 columns are populated.
Use a design that is basically a storage area for name/value pairs:
create table dbo.googledocs_tbl ( id long identity primary key, — easy way to access a single record owner_id integer/long not null, — fk to owning parent record in other table such as user owner_type char(2), — optional field, identifies the owing table, makes this table even more generic property_name varchar(n) not null, — required unique name for property, not an array reference (t1, t4, t50) — the names are controlled by the developer but should be human interpretable which can then be used in queries property_value int4 not null — the important value in question );
The fields owner_id, owner_type, property_name become a tertiary key that can never be changed, are unique and easily accessible via index lookup.
Add a timestamp if need be
You could then partition the record by owner_type or owner_id or whatever else comes to mind.
Then you just have to figure out the best way to index this monster for optimized queries.
On Dec 23, 2024, at 12:31 PM, Divyansh Gupta JNsThMAudy <ag1567827@xxxxxxxxx> wrote:
Currently I haven't created those columns , I have created addons_json column which is a JSONB column yet in a discussion weather I should create or consider only one JSONB column.
Range partition can help when you applies filter for a specific range but in my case I need to apply filter on userid always, however I have date columns but there is less variation in timestamp which I have that's why didn't go for range partition.
1. I bet you'd get better performance using RANGE partitioning. 2. Twenty million rows per userid is a LOT. No subdivisions (like date range)? Adrian, Please check this out;
PARTITION BY HASH (userid);
CREATE TABLE dbo.googledocs_tbl_clone_part_0 PARTITION OF dbo.googledocs_tbl_clone FOR VALUES WITH (modulus 84, remainder 0);
...
CREATE TABLE dbo.googledocs_tbl_clone_part_83 PARTITION OF dbo.googledocs_tbl_clone FOR VALUES WITH (modulus 84, remainder 83);
Adrian, the partition is on userid using hash partition with 84 partitions Ron, there could be more than 20 Million records possible for a single userid in that case if I create index on userid only not on other column the query is taking more than 30 seconds to return the results.
If your queries all reference userid, then you only need indices on gdid and userid. I have one confusion with this design if I opt to create 50 columns I need to create 50 index which will work with userid index in Bitmap on the other hand if I create a JSONB column I need to create a single index ?
Given what you just wrote, I'd stick with 50 separate t* columns. Simplifies queries, simplifies updates, and eliminates JSONB conversions. Values can be updated based on customer actions All rows won't have all 50 key value pairs always if I make those keys into columns the rows might have null value on the other hand if it is JSONB then the key value pair will not be there Yes in UI customers can search for the key value pairs During data population the key value pair will be empty array in case of JSONB column or NULL in case of table columns, later when customer performs some actions that time the key value pairs will populate and update, based on what action customer performs.
Let's make it more understandable, here is the table schema with 50 columns in it CREATE TABLE dbo.googledocs_tbl ( gdid int8 GENERATED BY DEFAULT AS IDENTITY( INCREMENT BY 1 MINVALUE 1 MAXVALUE 9223372036854775807 START 1 CACHE 1 NO CYCLE) NOT NULL, userid int8 NOT NULL, t1 int4 NULL, t2 int4 NULL, t3 int4 NULL, t4 int4 NULL, t5 int4 NULL, t6 int4 NULL, t7 int4 NULL, t8 int4 NULL, t9 int4 NULL, t10 int4 NULL, t11 int4 NULL, t12 int4 NULL, t13 int4 NULL, t14 int4 NULL, t15 int4 NULL, t16 int4 NULL, t17 int4 NULL, t18 int4 NULL, t19 int4 NULL, t20 int4 NULL, t21 int4 NULL, t22 int4 NULL, t23 int4 NULL, t24 int4 NULL, t25 int4 NULL, t26 int4 NULL, t27 int4 NULL, t28 int4 NULL, t29 int4 NULL, t30 int4 NULL, t31 int4 NULL, t32 int4 NULL, t33 int4 NULL, t34 int4 NULL, t35 int4 NULL, t36 int4 NULL, t37 int4 NULL, t38 int4 NULL, t39 int4 NULL, t40 int4 NULL, t41 int4 NULL, t42 int4 NULL, t43 int4 NULL, t44 int4 NULL, t45 int4 NULL, t46 int4 NULL, t47 int4 NULL, t48 int4 NULL, t49 int4 NULL, t50 int4 NULL, CONSTRAINT googledocs_tbl_pkey PRIMARY KEY (gdid), ); Every time when i query I will query it along with userid Ex : where userid = 12345678 and t1 in (1,2,3) and t2 in (0,1,2) more key filters if customer applies On the other hand if I create a single jsonb column the schema will look like : CREATE TABLE dbo.googledocs_tbl ( gdid int8 GENERATED BY DEFAULT AS IDENTITY( INCREMENT BY 1 MINVALUE 1 MAXVALUE 9223372036854775807 START 1 CACHE 1 NO CYCLE) NOT NULL, userid int8 NOT NULL, addons_json jsonb default '{}'::jsonb CONSTRAINT googledocs_tbl_pkey PRIMARY KEY (gdid), ); and the query would be like where userid = 12345678 and ((addons_json @> {t1:1}) or
(addons_json @> {t1:2}) or
(addons_json @> {t1:3}) more key filters if customer applies
So here my question is considering one JSONB column is perfect or considering 50 columns will be more optimised.
The relational database engine is designed around the column-based approach. Especially if the columns are generally unchanging, combined with using fixed-width data types.
David J.
-- Death to <Redacted>, and butter sauce. Don't boil me, I'm still alive.
-- Death to <Redacted>, and butter sauce. Don't boil me, I'm still alive.
-- Death to <Redacted>, and butter sauce. Don't boil me, I'm still alive.
|