On Thu, Jan 20, 2011 at 12:21 PM, Dotan Cohen <dotancohen@xxxxxxxxx> wrote: > I understood that. My concern is exactly with adding new nodes. There > is no incrementor (++i) in SQL, so knowingly coding a solution that > will require incrementing two fields in half the database rows seems > irresponsible. > It only requires updating the category rows. If you have several hundred categories this is a non-issue. If you have several thousand categories, you probably have millions of products, and you'd want to do some performance analysis on it. Even still, this is necessary only when adding new categories. If you are doing this often, you could leave spaces in the left and right values so that you could minimize the number of rows that need to be updated. The article makes every leaf use x and x+1 for left and right which forces another update to add a child. If instead you used x and x+20 you'd leave space for more children without any updates. This could be applied from top to bottom, starting with the root category getting 0 and MAX_INT for its values. However, it's probably not even worth applying that complexity until you prove that frequent category additions are causing problems. Most systems will be querying against the categories table far more frequently, and that's where this model pays off. If you want to see all products in category X and its subcategories, it's a single *non-recursive* query. That's huge if you are doing a lot of searches like this. But what a mess this would be if the two methods go out of sync! > Sure, but these values would be maintained by your code--not end-users. It just comes down to making sure your code is correct through appropriate unit tests. By moving the logic to a stored procedure, you can ensure the table is locked during the updates to keep two users from adding a new category simultaneously. That pays off more? For the guy writing code or for the database > memory requirement? > Performance-wise. The nested set method looks to be moderately more complex code-wise, but luckily that is done just once while querying the database is done again and again. As with all optimizations, it's best to measure and make sure there's a problem before trying to solve it. Once you've built a few hierarchical systems, you'll be able to make a gut call up front. Only two update statements, but they are affecting on average half the > database's rows! > Of a single table: categories. Hopefully you have far more items that get categorized than you do categories. > Which do you call the hierarchical model? That term is not used in the > linked article. > Well, both models are hierarchical in the sense that there's a parent-child relationship. By hierarchical here I mean that the method of implementation involves each category pointing to its parent directly via a parent_id column. Searching for all subcategories of category X requires searching first for all children, then all grandchildren, and so on, resulting in a recursive query. Using the nested sets model requires a single non-recursive query to get the same data. David