I need to support the following queries :
1. give all documents where attrib X='value'
2. give me all documents where attib X='value' and attrib Y='value2'
the distinct attributes if about 10,000,000,000 on about 10 difference type (X,Y etc), so in average 1000 M for each.
each attribute may appear in 10-20 documents .
The model has to optimized to fast read & write.
Here are two model I was thinking of :
Option 1 : Using RDBMS (PG)
One big table (partition by type) , index on value , index on document id , type:
attr type, attr value , document id .
For query 1 - it a simple query .
For query 2 - do self join
Option 2 :
Same on option 1 but to hold all documents id in one string :
for example :
'host', 'myhost', apper in ' 3,5,6,7 ,8'
For query 2 :
do one query - with or , for example : select document ids from ... where (attr='X' and value='Y')
union
select document ids from ... where (attr='X' and value='Y')
and the do set merging
Other options :
? using btree_gin ?
elasticsearch ?