Search Postgresql Archives

Re: making a pg store of 'multiple checkboxes' efficient

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Jonathan Vanasco wrote:
> I have a large table (5M items current) that is projected to grow at the
> rate of 2.5M a month looking at the current usage trends.
> 
> the table represents some core standardized user account attributes ,
> while text heavy / unstandardized info lies in other tables.
> 
> my issue is this: i'm adding in a 'multiple checkboxes' style field, and
> trying to weigh the options for db representation against one another.
> 
> my main concern is speed - this is read heavy , but I am worried to some
> degree about disk space (not sure where disk space fits in with pg, when
> I used to use mysql the simplest schema change could drastically effect
> the disk size though ).
> 
> that said , these are my current choices:
> 
>     option a
>          bitwise operations
>             and/or operations to condense checkboxes into searchable field
>             pro:
>                 super small
>                 fits in 1 table
>             con:
>                 could not find any docs on the speed of bitwise searches
> in pg

I got some good results using bitwise operations on an integer column.
An index on such a column helps a great deal of course. What type of
integer you need depends on how many booleans you (expect to) have.

My operations were like "WHERE (value & 80) = 80" to match against the
7th and 5th bits, with value an int4 column. Query times are in the tens
of milliseconds range.

Admittedly I only got about that many records through joins with other
tables I needed (meaning that my result set may at some point in time
have been around as big as your data set), so the planner may have
reduced the number of bitwise operations significantly for me.

The actual number of bitwise values to compare was around 40,000
integers * 25 mask values, but that got joined with over 1M records from
another result set.

A bitwise operator on a single column should (theoretically) have less
overhead than integer/boolean operators on multiple columns. Computers
are good at bitwise operations, after all.

-- 
Alban Hertroys
alban@xxxxxxxxxxxxxxxxx

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
   7500 AK Enschede

// Integrate Your World //


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux