Re: Organisational question: surely someone has implemented many Boolean values (tags) and a solution exist

David Harkness <david.h@xxxxxxxxxxxxxxxxx> · Thu, 20 Jan 2011 12:58:38 -0800

On Thu, Jan 20, 2011 at 12:21 PM, Dotan Cohen <dotancohen@xxxxxxxxx> wrote:

> I understood that. My concern is exactly with adding new nodes. There
> is no incrementor (++i) in SQL, so knowingly coding a solution that
> will require incrementing two fields in half the database rows seems
> irresponsible.
>

It only requires updating the category rows. If you have several hundred
categories this is a non-issue. If you have several thousand categories, you
probably have millions of products, and you'd want to do some performance
analysis on it. Even still, this is necessary only when adding new
categories.

If you are doing this often, you could leave spaces in the left and right
values so that you could minimize the number of rows that need to be
updated. The article makes every leaf use x and x+1 for left and right which
forces another update to add a child. If instead you used x and x+20 you'd
leave space for more children without any updates. This could be applied
from top to bottom, starting with the root category getting 0 and MAX_INT
for its values.

However, it's probably not even worth applying that complexity until you
prove that frequent category additions are causing problems. Most systems
will be querying against the categories table far more frequently, and
that's where this model pays off. If you want to see all products in
category X and its subcategories, it's a single *non-recursive* query.
That's huge if you are doing a lot of searches like this.

But what a mess this would be if the two methods go out of sync!
>

Sure, but these values would be maintained by your code--not end-users. It
just comes down to making sure your code is correct through appropriate unit
tests. By moving the logic to a stored procedure, you can ensure the table
is locked during the updates to keep two users from adding a new category
simultaneously.

That pays off more? For the guy writing code or for the database
> memory requirement?
>

Performance-wise. The nested set method looks to be moderately more complex
code-wise, but luckily that is done just once while querying the database is
done again and again. As with all optimizations, it's best to measure and
make sure there's a problem before trying to solve it. Once you've built a
few hierarchical systems, you'll be able to make a gut call up front.

Only two update statements, but they are affecting on average half the
> database's rows!
>

Of a single table: categories. Hopefully you have far more items that get
categorized than you do categories.

> Which do you call the hierarchical model? That term is not used in the
> linked article.
>

Well, both models are hierarchical in the sense that there's a parent-child
relationship. By hierarchical here I mean that the method of implementation
involves each category pointing to its parent directly via a parent_id
column. Searching for all subcategories of category X requires searching
first for all children, then all grandchildren, and so on, resulting in a
recursive query.

Using the nested sets model requires a single non-recursive query to get the
same data.

David