Hi All,
First, some background:
- We are using PostgreSQL 7.3.4, and am locked into this version. I would upgrade if I could, but the decision is not mine.
- The table referred to below is 120+ million rows, and has a width of 27 columns (15 smallints, 5 integers, 4 dates, 1 integer[], 1 single char text field, and 1 double precision). This table already has 3 multi-field indexes, as well as a primary key index.
I need to add many boolean flags to my database that need to be retrieved by primary key, as well as be searched upon with decent speed. By "many", I mean about 30 booleans total. The added bonus is that some of these values need to be searched in an OR manner, while others need to be searched as an AND. For example, I will need to do a search that looks something like: select * from table where A && B && C && (D || E) && F
Our first thought was to "pack" these booleans into integer fields to save space and help with searching (so we thought). The design calls for a smallint field and an integer field to be added to the table. UPDATE/INSERT would be handled by calculating the values of the two fields outside of the database. Searching would be accomplished by taking the bits that were requested, creating a bitmask corresponding to the 2 fields, and then doing something like: select * from table where (field & BITMASK) = BITMASK. We felt that putting this information in this table rather than a separate one would help performance by eliminating a JOIN, and it was assumed (ugh) that if we built an index on those two fields, that we would be able to use those indexes for searching in the aforementioned manner.
Unfortunately we have come up with the following problems with this approach:
1. Postgres will only use 1 index per table, so putting these fields in the same table as several other searchable fields actually hurts us more than helps.
2. We haven't been able to get any index scans in preliminary testing (all seq. scans).
After all of that background, my actual question is: what is the best way to effectively store and query this type of data? It seems more and more that our initial design is just flat-out wrong, but we are at a loss as to what the other options are.
Thanks very much in advance, and I apologize for the length of the message, but I felt background info would be important.
Regards,
Richard