while it sometimes good and extremely easy to setup, it has some drawbacks
- querying and searching has some limitations (keys/values not easily indexable, horrible syntax)
- storage not optimised (one hstore field = several dozens of boolean columns)
- only text values, no data type validation
I'd recommend option (4) - normalize!
FR
2011/12/5 Mike Christensen <mike@xxxxxxxxxxxxx>
The hstore module sounds fantastic!>> I have a database full of recipes, one recipe per row. I need to
>> store a bunch of arbitrary "flags" for each recipe to mark various
>> properties such as Gluton-Free, No meat, No Red Meat, No Pork, No
>> Animals, Quick, Easy, Low Fat, Low Sugar, Low Calorie, Low Sodium and
>> Low Carb. Users need to be able to search for recipes that contain
>> one or more of those flags by checking checkboxes in the UI.
>>
>> I'm searching for the best way to store these properties in the
>> Recipes table.
>
> I'd use hstore to store them as tags. You can then use hstore's GiST index
> support to get quick lookups.
>>
>> 1. Have a separate column for each property and create an index on
>> each of those columns. I may have upwards of about 20 of these
>> properties, so I'm wondering if there's any drawbacks with creating a
>> whole bunch of BOOL columns on a single table.
>
> It'll get frustrating as you start adding new categories, and will drive you
> insane as soon as you want to let the user define their own categories -
> which you will land up wanting to do in your problem space. I'd avoid it.
>>
>> 2. Use a bitmask for all properties and store the whole thing in one
>> numeric column that contains the appropriate number of bits. Create a
>> separate index on each bit so searches will be fast.
>
> Same as above, it'll get annoying to manage when you want user tagging.
>>
>> 3. Create an ENUM with a value for each tag, then create a column that
>> has an ARRAY of that ENUM type. I believe an ANY clause on an array
>> column can use an INDEX, but have never done this.
>
> Same again.
>>
>> 4. Create a separate table that has a one-to-many mapping of recipes
>> to tags. Each tag would be a row in this table. The table would
>> contain a link to the recipe, and an ENUM value for which tag is "on"
>> for that recipe. When querying, I'd have to do a nested SELECT to
>> filter out recipes that didn't contain at least one of these tags. I
>> think this is the more "normal" way of doing this, but it does make
>> certain queries more complicated - If I want to query for 100 recipes
>> and also display all their tags, I'd have to use an INNER JOIN and
>> consolidate the rows, or use a nested SELECT and aggregate on the fly.
>
> That'll get slow. It'll work and is IMO better than all the other options
> you suggested, but I'd probably favour hstore over it.
I'm curious as to how these columns are serialized back through the
driver, such as Npgsql. Do I get the values as strings, such as a
comma delimited key/value pair list? Or would I need to do some
custom logic to deserialize them?
Right now, I'm using Npgsql as a driver, and NHibernate/Castle
ActiveRecord as an ORM.
Mike
--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general