I have a database which stores receiver to
indicate which account the data relates to. This has led to tons of duplication of data, as one set of data may create 3 separate rows, where the only difference is the receiver column. |---------------------|------------------|---------------------|------------------| | Receiver | Event | Date
| Location | |---------------------|------------------|---------------------|------------------| | Alpha | 3 | 12
| USA | |---------------------|------------------|---------------------|------------------| | Bravo | 3 | 12
| USA | |---------------------|------------------|---------------------|------------------| | Charlie | 3 | 12
| USA | |---------------------|------------------|---------------------|------------------|
While redesigning the database, I have considered using an array with a GIN index instead of the current B-Tree index on receiver. My proposed new table would look like this: |-------------------------------|--------------|------------|-------------------| | Receivers | Event | Date
| Location | |-------------------------------|--------------|------------|-------------------| | ["Alpha", "Bravo", "Charlie"] | 3 | 12
| USA | |-------------------------------|--------------|------------|-------------------|
More Information:
·
Receiver names are of the type (a-z, 1-5, .)
·
95% of all queries currently look like this: SELECT
* FROM table WHERE Receiver = Alpha, with the new format this would be SELECT
* FROM table WHERE receivers @> '"Alpha"'::jsonb;
·
The table currently contains over 4 billion rows (with duplication) and the new proposed schema would cut it down to under 2 billion rows.
·
Question:
1.
Does it make more sense to use Postgres Native Text Array?
2.
Would a jsonb_path_ops GIN
index on receivers make sense here?
3.
Which option is more efficient? Which is faster? |