JSONB Array of Strings (with GIN index) versus Split Rows (B-Tree Index)

Syed Jafri <syed.jafri2@xxxxxxxxxxx> · Mon, 4 Feb 2019 05:34:57 +0000

I have a database which stores receiver to
 indicate which account the data relates to. This has led to tons of duplication of data, as one set of data may create 3 separate rows, where the only difference is the receiver column.

|---------------------|------------------|---------------------|------------------|
|      Receiver       |       Event      |         Date       
 |      Location    |
|---------------------|------------------|---------------------|------------------|
|       Alpha         |         3        |          12        
 |         USA      |
|---------------------|------------------|---------------------|------------------|
|       Bravo         |         3        |          12        
 |         USA      |
|---------------------|------------------|---------------------|------------------|
|       Charlie       |         3        |          12        
 |         USA      |
|---------------------|------------------|---------------------|------------------|

While redesigning the database, I have considered using an array with a GIN index instead of the current B-Tree index on receiver. My proposed new table would look like this:
|-------------------------------|--------------|------------|-------------------|
|           Receivers           |     Event    |    Date   
 |     Location      |
|-------------------------------|--------------|------------|-------------------|
| ["Alpha", "Bravo", "Charlie"] |       3      |     12    
 |         USA       |
|-------------------------------|--------------|------------|-------------------|

More Information:

·        
Receiver names are of the type (a-z, 1-5, .)

·        
95% of all queries currently look like this: SELECT
 * FROM table WHERE Receiver = Alpha, with the new format this would be SELECT
 * FROM table WHERE receivers @> '"Alpha"'::jsonb;

·        
The table currently contains over 4 billion rows (with duplication) and the new proposed schema would cut it down to under 2 billion rows.

·        

Question:

1.     
Does it make more sense to use Postgres Native Text Array?

2.     
Would a jsonb_path_ops GIN
 index on receivers make sense here?

3.     
Which option is more efficient? Which is faster?