Re: Problem with UPDATE and UNIQUE

Michael Glaesemann <grzm@xxxxxxxxxxxxxxx> · Wed, 22 Aug 2007 18:27:42 -0500

On Aug 22, 2007, at 1:02 , Frank Millman wrote:

I want to store data in a 'tree' form, with a fixed number of  
levels, so
that each level has a defined role.

First thought: fixed, predetermined levels, separate tables for each  
level. If a more general approach is desired, your options are  
generally adjacency list, nested sets, or contrib/ltree. Each has  
their own strengths and weaknesses.

I have the following (simplified) table -

CREATE TABLE treedata (
  rowid serial primary key,
  levelno int not null,
  parentid int references treedata,
  seq int not null,
  code varchar not null,
  description varchar not null
  );

rowid + parentid looks like adjacency list to me. Note that you're  
storing redundant data (the levelno, which can be derived from the  
rowid/parentid relationships), which you may want to do for  
performance reasons, but does make things more complicated: you're  
essentially caching data which brings with it problems of cache  
invalidation. In this case, you need to make sure you're updating  
levelno whenever it needs to be updated. (Which I'm sure you've  
already thought of.)

To describe each of the levels in the tree, I have the following  
table -

CREATE TABLE treelevels (
  levelno int primary key,
  code varchar unique not null,
  description varchar not null
  );

Having each level as its own table would make this redundant, but  
again, that might not fit with what you're modeling.

Typical values for this table could be -
  (0,'Prod','Product code')
  (1,'Cat','Product category')
  (2,'*','All products')

This makes me think you'll want to rethink your schema a bit, as  
you're mixing different types of data: categories and products. I'd  
at least separate this out into a products table and a categories  
table. The categories table may in fact still require some kind of  
tree structure, but I don't think products belongs as part of it.

CREATE TABLE products
(
    product_code text PRIMARY KEY
    , product_name text NOT NULL UNIQUE
    , product_description TEXT NOT NULL
);

CREATE TABLE categories
(
    category_id INTEGER PRIMARY KEY
    , category_name TEXT NOT NULL UNIQUE
    , category_description TEXT NOT NULL
    , parent_category_id INTEGER NOT NULL
        REFERENCES categories
    , level INTEGER NOT NULL
    , seq INTEGER NOT NULL
);

One difference here is that I have a NOT NULL constraint on  
parent_category_id, and I'd define the root as a category_id that has  
a parent_category = category_id. You should be able to add a check  
constraint that makes sure only level = 0 has parent_category_id =  
category_id and a unique index so only one category can have level 0.  
Something like:

-- check constraint
CHECK (
    CASE WHEN category_id = parent_category_id
        THEN level = 0
    ELSE category_id <> parent_category_id
    END)
-- unique index
CREATE UNIQUE INDEX categories_root_uniq_idx
ON categories (level)
WHERE level = 0;

These are untested, but I think they should work.

Given that you'd like to determine the order of the items, I think  
you might be interested in using nested sets rather than adjacency  
list, as this information is automatically encoded for you. This  
would look something like:

CREATE FUNCTION valid_nested_interval_bounds(INTEGER, INTEGER)
RETURNS BOOLEAN
IMMUTABLE
LANGUAGE SQL AS $body$
SELECT $1 < $2 -- make sure they're properly ordered
    AND ($2 - $1 - 1) % 2 = 0 -- useful if using the more strict  
versions of nested set (where there no gaps)
    -- and you can easily calculate the total number of descendants
    -- from just the upper and lower bounds of the parent node
$body$;

I generally separate the validity test into a separate function as I  
often have a couple different nested set hierarchies in the same  
database, and it makes the table definition clearer. I believe the  
function is inlined in the check constraint,  but I'm not sure. You  
can, of course, move the SQL in the function straight into table  
definition.

CREATE TABLE categories
(
    , category_name TEXT PRIMARY KEY
    , category_description TEXT NOT NULL
    , category_lower INTEGER NOT NULL UNIQUE -- often called "left"
    , category_upper INTEGER NOT NULL UNIQUE -- often called "right"
    , CHECK (valid_nested_interval_bounds(category_lower,  
category_upper))
);

Another advantage of nested sets is that selecting all of the  
descendants of a particular node is very easy. The downside of the  
nested set model is that inserts and updates are a bit more  
complicated and db intensive. If your categories aren't going to  
change much, this shouldn't be a problem.

Again, you could add level information into the categories table, but  
you'd need to make sure to update it appropriately.

CREATE TABLE category_products
(
    category_name TEXT NOT NULL REFERENCES categories
    , product_code TEXT NOT NULL REFERENCES products
    , UNIQUE (category_name, product_code)
);

My comments below refer to your original design.

Now for the problem. I want to insert or delete levels dynamically.  
I can
insert or delete levels in 'treedata' without a problem. However, I  
also
want to insert or delete a level in 'treelevels'.

Say I want to insert a level between 'code' and 'category' called  
'group' -

INSERT INTO treelevels VALUES (1,'Group','Product group');

It's a good habit to *always* explicitly list your columns: it's self- 
documenting and more robust in the face of schema changes.

Obviously this will fail with a duplicate levelno. Therefore before  
the
insert statement I want to do this -

UPDATE treelevels SET levelno = (levelno+1) WHERE levelno >= 1;

The problem is that if there are a number of levels, and they are in
indeterminate order, I can get duplicate level numbers while the  
command is
being executed.

My workaround at present is the following -

UPDATE treelevels SET levelno = (levelno+10001) WHERE levelno >= 1;
UPDATE treelevels SET levelno = (levelno-10000) WHERE levelno >= 1;

This is a general problem with nested sets and your situation where  
you're caching the levelno, and you're work around is similar to the  
two generally recommended solutions. One is to make updates using an  
offset such as what you're doing, and the other is to utilize  
negative levels. I'm keen on the latter, as I feel it's a bit more  
flexible: you don't need to make sure your offset is large enough.  
I'd do something like this:

BEGIN;

UPDATE treelevels
SET levelno = -1 * (levelno + 1)
WHERE levelno >= 1;

INSERT INTO treelevels (levelno, code, description) VALUES (1,  
'Group', 'Product group');

UPDATE treelevels
SET levelno = -1 * levelno
WHERE levelno < 0;

COMMIT;

And you can wrap the common operations (insert, move, delete) in  
functions for convenience.

Anyway, hope this gives you something to think about.

Michael Glaesemann
grzm seespotcode net

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
      subscribe-nomail command to majordomo@xxxxxxxxxxxxxx so that your
      message can get through to the mailing list cleanly